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PREFACE TO VOLUME II 


This volume falls into five sections. The first, comprising chapters 17 to 20^,deals 
with Estimation. The second, comprising chapters 21, 23, 24 and 26 to 2ST covers the 
Theory of Statistical Tests, including the Analysis of Variance and Multivariate Analysis. 
The third, consisting of chapter 22, deals with Regression Analysis and completes the 
account of statistical relationship begun in chapters 13 to 16 of Volume I. In the fourth, 
chapter 25, I have tried to give an introductory account of the reaction of theoretical 
considerations on the Design of Statistical Inquiries. Finally, the fifth, comprising chapters 
29 and 30, deals with the Analysis of Time-Series. 

The literature of statistical theory is now so vast that it seemed worth while devoting 
considerable space to a bibliography, which is given in Appendix B. Although it is far 
from complete, I hoj>e that it will serve its purpose in guiding the student to the main 
sources. » 

The chief problem in the writing of this volume arose in connection with the logic of 
statistical inference. Whenever possible I have kept the treatment objective. It is, 
I consider, unfair in a book of this kind not to present all sides of a case, particularly when 
there is so much disagreement among the authorities. Some day I hope to show that 
this disagreement is more apparent than real, and that all the existing theories of inference 
in probability differ essentially only in matters of taste in the choice of postulates. But 
this book is not the place for such work, and for the present I am content to state the 
position and to leave the reader to exercise his own choice. 

The difficulty became most acute in dealing with confidence intervals and fiducial 
inference, where two approaches which at first sight appear identical can lead to different 
results. Rather than try to reconcile them I have written a separate chapter on each. 
Professor E. S. Pearson was kind enough to read the manuscript of chapter 19 and Professor 
R. A. Fisher that of chapter 20, so that I think their respective views are, at any rate, not 
misrepresented. I am very grateful to them both for their help in this connection. 

My thanks are also due to Mr. P. A. Moran and Mr. A. J. H. Morrell, who cheerfully 
undertook to help with the proof reading and to whose painstaking scrutiny I owe the 
removal of a number of obscurities and errors. I shall be grateful to any reader who 
detects and notifies me of any further slips which have evaded us. Once again I have also 
to thank the publishers and the printers for the trouble they haye taken in the production 
of the finished work. 

M. G. K. 

London, 

April , 1946. 
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CHAPTER 17 

ESTIMATION: LIKELIHOOD 


The Problem 

17.1. On several occasions in previous chapters we have encountered the problem 
of estimating from a sample the values of the parameters of the parent population. We 
have hitherto dealt on somewhat intuitive lines with such questions as arose—for example, 
in tlie theory of large samples wo have taken the moans and moments of the sample to be 
satisfactory estimates of the corresponding means and moments in the parent. 

We now proceed to study this branch of the subject in more detail. In the earlier 
part of the present chapter we shall examine the sort of criteria which are required of 
a “ good ” estimate and discuss the question whether there exist “ best ” estimates in 
any acceptable sense of the term. In the remainder of the chapter and in Chapter 18 
we shall consider various methods of obtaining estimates with the required properties. 
In Chapters 19 and 20 we shall look at the same problem from a rather different point of 
view and discuss the theories of confidence intervals and fiducial limits. 

17.2. It will be evident that if a sample is not random and nothing precise is known 
about the nature of the bias operating when it was chosen, very little can be inferred from 
it about the parent population. Certain conclusions of a trivial kind are sometimes pos¬ 
sible—for instance, if we take ten turnips from a pile of 100 and find that they weigh ten 
pounds altogether, the mean weight of turnips in the pile must be greater than one-tenth of 
a pound ; but such information is rarely of value, and estimation based on biassed samples 
remains very much a matter of individual opinion and cannot be reduced to exact and 
objective terms. *We shall therefore confine our attention to random samples only. Our 
general problem, in its simplest terms, is then to estimate the value of a parameter in the 
parent from the information given by the sample. In the first instance we consider 
the case when only one parameter is to be estimated. The case of several parameters 
will be discussed later J 

17.3. Let us in the first place consider what we mean by “ estimation ”. We know, 
or assume as a working hypothesis, that the parent jjopulation is distributed in a form 
which would be completely determinate if we knew the value of some parameter 0. We 
are given a sample of values x x . . . x n . We require to determine, with the aid of the 
x’& 9 a number which can be taken to be the value of 0, or a range of numbers which can 
be taken to include that value. 

Now a single sample, considered by itself, may be rather improbable, and any estimate 
based on it may therefore differ considerably from the true value of 0. It appears, 
therefore, that we cannot expect to find any method of estimation which can be guaran¬ 
teed to give us a close estimate of 0 on every occasion and for every sample. We must 
content ourselves with formulating a rule which will give good results “ in the long run ” 
or “ on the average ”, or which has “ a high probability of success ”—phrases which 
express the fundamental fact that we have to regard our method of estimation as generating 
a population of estimates and to assess its merits according to the properties of this 
population. 

a.s.— n 1 


B 
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17.4. It will clarify our ideas considerably if we draw a distinction between the 
method or rule of estimation, which, following Pitman, we shall call an Estimator, and the 
value to which it gives rise in particular cases, the Estimate. The distinction is the same 
as that between a function/(#), regarded as defined for a range of the variable x, and the 
particular value which the function assumes, say f (a), for a specified value of x equal to a: 
Our problem is not to find estimates, but to find Estimators. We do not reject a method 
because it gives a bad result in a particular case (in the sense that the estimate differs 
materially from the true value). We should only reject it if it gave bad results in the long 
run, that is to say, if the population of possible values of the estimator were seriously 
discrepant with the value of 0. The merit of the estimator is judged by the population 
of estimates to which it gives rise. It is itself a random variable and has a distribution/ 
to which we shall frequently have occasion to refer. 


17.5. In the theory of large samples we have often taken as an estimator of. a para¬ 
meter 0ji statistic t calculated from the sample in exactly the same wayasTTis calculated 
from the population, e.g. the sample-mean is taken as an estimate of the parent mean. 
Let us examine how this procedure can be justified. Consider the case when the parent 
population is 


dF = 


V(2rc) 


ex P {— i (* - ®) 2 } dx, 


OO < X ^ 00 


. (17.1) 


Requiring an estimator for the parent mean 0, we take 


The distribution of t is 



dF = 


y/n 

vW) 


exp 



(t 


0) 2 l dt, 


(17.2) 

(17.3) 


that is to say, t is distributed normally about 0 with variance 1 /n. We notice two things 
about this distribution : (a) it has a mean (and median and mode) at the true value 0, 
and (6) as n increases, the scatter of possible values of t about 0 becomes smaller, so that 
the probability that a given t differs by more than a fixed amount from 0 decreases. We 
may say that the accuracy of the estimator increases as n increases, or simply with n. 


17.6. Generally, it will be clear that the phrase “ accuracy increasing with n ” has 
a definite meaning whenever the sampling distribution of t has a variance which decreases 
with l/n and a central value which is either identical with 0 or differs from it by a quantity 
which also decreases with l/n. Many of the estimators with which we are commonly 
concerned are of this type, but there are exceptions. Consider, for example, the Cauchy 
population 

dF = - - p- , - oo < x < oo . . . (17.4) 

n 1 + (x — 0)* 

The mean (assuming that we conventionally agree that it exists) is at x = 0. But if we 
try to estimate 6 by the mean-statistic t we have, for the distribution of t, 


dF = - 
n 


dt 


— 00 < t < 00 


(17.5) 


1 + (*-«)•’ 

(Cf. Example 10.1, vol. 1, pp. 233-4.) In this case the distribution of t is the same 
as that, of any single value of the sample, and does not increase in accuracy as n increases. 
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Consistence 

17.7. The property of possessing increasing accuracy is evidently a very desirable 
one ; and indeed, if the variance of the sampling distribution decreases with increasing 
n it is necessary that its central value should tend to 0, for otherwise the estimator would 
have values differing systematically from the true value and would be useless, not to say 
dangerous. We therefore formulate our first criterion for a suitable estimator as follows :— 

An estimator t n , computed from a sample of n values, will be said to be a consistent 
estimator of 0 if, for any positive e and ?/, however small, there is some N such that the 
probability that 

K-0 \<* .(17.6) 

is greater than 1 - j] for all n > N. Tn the notation of the theory of probability, 

P{\t n -d\ <*}>l-r,, n>N. . . . (17.7) 

The definition bears an obvious analogy to the definition of convergence in the mathe¬ 
matical sense. Given any fixed small quantity e we can find a large enough sample number 
such that for all samples over that size the probability that t differs from the true value 
by more than e is as near zero as we please. t n is said to converge in probability to 0. Thus 
t is a consistent estimate of 0 if it converges to 0 in probability. 

Example 17.1 

The sample mean is a consistent estimator of the parameter 0 in the population (17.1). 
This we have already established in general argument, but more formally the proof would 
proceed as follows :— 

Suppose we are given e. From (17.3) we see that (t — 0) y/n is distributed normally 
about zero with unit variance. Thus the probability that | (t — 6) y/n | < e y/n is the 
value of the normal integral between limits -± ey/n. Given any positive we can 
always take n large enough for this quantity to be greater than 1 — r\ and it will continue 
to be so for any larger n. N may therefore be determined and the inequality (17.7) is 
satisfied. 

Example 17.2 

Suppose we have a statistic t n whose mean value differs from 0 by order n” 1 , whose 
variance v n is of order n 1 and w r hich tends to normality as n increases. Clearly 
(i t n — 0)/y/v n will then tend to zero in probability and t n will be consistent. This covers 
a great many statistics encountered in practice. 

Unbiassed Estimators 

17.8. The property of consistence is a limiting property, that is to say, it concerns 
the behaviour of an estimator as the sample number tends to infinity. It requires nothing 
of the behaviour for finite w, and if there exists one consistent estimator t n we may construct 
infinitely many others ; e.g. 

n — a 
n~~b n 

is also consistent. We have seen that in some circumstances a consistent estimator of the 
mean is the sample mean 

—. i 

x ~ - Ex*, 
n 1 


(17.8) 
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But so is 



(17.9) 


Why do we prefer one to the other ? Intuitively it seems absurd to divide the sum of 
n quantities by anything other than their number n. We shall see in a moment, however, 
that intuition is not a very reliable guide on such matters. There are reasons for preferring 


1 


n — 


1 


jr fa - x )* 

7-1 


(17.10) 


to 



. (17.11) 


as an estimator of the parent variance, notwithstanding that the latter is the sample 
variance. 


17.9. Consider the sampling distribution of an estimator t. If the estimator is 
consistent, its distribution must, for large samples, have a central value in the neighbour¬ 
hood of 0. We may choose among the field of consistent estimators by requiring that 
0 shall be equated to this central value not merely for large, but for all samples. Whether 
we choose as the appropriate central value the mean, the median or the mode is to some 
extent a matter of taste. We shall consider below what follows if we select the mode 
(which gives us the maximum likelihood estimators). For the present we discuss the mean. 

If we require that for all n the mean value of t shall be 0, we define what is known as 
an unbiassed estimator : 

E (t) =0 .(17.12) 

This is an unfortunate word, like so many in statistics. There is nothing except con¬ 
venience to exalt the arithmetic mean above other measures of location as a criterion of 
bias. We might equally well have chosen the mode as determining the “ unbiassed ” 
estimator, in which case the mean estimator would be “ biassed ” whenever it gave a dif¬ 
ferent result. Since the use of “ unbiassed ” in connection with the mean is fairly wide¬ 
spread, however, we shall continue to use it.* 


Example 17.3 
Since 


\ VJ 

t 


C* 


*£*(*)]■ {*(*)} 
J n 


= --£>1 = Ml . 


aum ui 
¥ 


the mean-statistic is an unbiassed estimator of the parent mean whenever the latter exists. 
But the sample-variance is not an u nbiassed estimator of the paren t variance. We have 

E {Z (x - *)*} = E jr'[* -1 E (x) 

-E £<*,**)}, 

— (» — — (» — l)/«i 8 

= (»-!) 

* The word has already occurred in vol. I, p. 200, in this sense. It may be spelt with either one 
or two ‘*’s. My usage, I am afraid, is not consistent, but in this volume I use two. - 


j 9* k 
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1 ^ 1 • 
Thus - Z (x — x) 2 has a mean value — u 2 . On the other hand, an unbiassed estimator 

n ' ' n 

is given by 

-1- I(x- x)\ 
n — l 

and for this reason it is sometimes preferred to the sample variance. There are other 
reasons which will appear when we come to study the analysis of variance. 

Efficient Estimators 

17.10. In general there will exist more than one consistent estimator of a parameter, 
even if we confine ourselves only to unbiassed estimators. Consider once again the esti¬ 
mation of the mean of a normal population with known variance. The sample mean is 
consistent and unbiassed. We will now prove that the same is true of the median. 

Consideration of symmetry is enough to show that the median is an unbiassed estimate 
of the parent mean, which is, of course, the same as the parent median. For large n the 
distribution of the median tends to the normal form (cf. Example 9.7, vol. 1, p. 213), 

dF oc exp {-- 2 nfi (x - 0 ) 2 } dx ... (17.13) 

where f t is the median ordinate of the parent, in our present case l/\/(2?r) ~ 0-3989. The 
variance tends to zero and the estimator is consistent. Its variance is n/2n. 

17.11. We are therefore at liberty to seek for further criteria to choose between 
estimators with the common property of consistence. Such a criterion arises naturally 
if we consider the sampling variances of the estimators. Generally speaking, the estimator 
with the smaller variance will be grouped more closely round the value 0 ; this will certainly 
be so for distributions of the normal type. An estimator with a smaller variance will 
therefore deviate less, on the average, from the true value than one with a larger variance. 
Hence we may reasonably regard it as better or more efficient. *V 

If, of two consistent estimators X and t 2 , we have var t x < var t 2 then t x is 

more efficient tmi« for all sample sizes. It is possible to have var t x < var t % for sqme 
ranges of n and var t x var t 2 for others, in which case the estimators are more or less 
efficient in different ranges. ^ ' 

In the case of mean and median we have, for any n, o 1/ : * - 

var (mean) — —, . . . . . (17.14) 

fh 

and for large n 

jr/r 2 i 

var (median) — —,' ..... (17.15) 

*A 

where o® is the parent variance. Since n/2 = 1 57 > 1 the mean is more efficient than \ 
the median for large n at least. For small n we have to work out the variance of the median, j 
The following values may be obtained from those given in Table XXIII of Tables for 1 
Statisticians and Biometricians, Part II :— ~~ 

» 2 3 4 5 

var (median) 1-00 1-35 1-19 1-44 

It appears that the mean is always more efficient than the median in estimating the para¬ 
meter 6 for the normal distribution (17.1). 
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Example 17.4 

For the Cauchy distribution 



71 


dx 

+ (x - 0) 2 ’ 


— QO <X < 00 


we have already seen that the sample n, nomsiafont, ggjamator. 

the median in large samples we have, since the median ordinate is l/n, 

var (median) = —. 

4?i 


However, for 


It is seen that the median is consistent, and although direct comparison with the mean 
is not possible because the latter does not possess a sampling variance, the me dian i s evi¬ 
dently a better estimator for 0 than the mean. This provides an interesting contrast with 
the case of the normal parent, particularly in view of the similarity of the parent frequency- 
distributions. 


17.12. In some cases, as we shall see below, there exist consistent estimators whose 
sampling variance for large samples is less than that of any other such estimator. We 
shall call su ch estimators most-effi cient. When they exist they provide a standard of 
measurement of efficiency. In fact, if t 2 has variance v 2 and the most-efficient estimator 
ti has variance v l9 the efficiency E of t 2 is defined as 

e =© 

Vi 


(17.16) 


It will be seen later that in normal samples the mean is a most-efficient estimator, so that 
the efficiency of the median for such samples is 


2 n 

71 


!= 0-037. 
n 


17.13. If we have a sample of 100 members the variance of the median (assuming 
normality) will be about the same as that of the mean in only 64 members. Thus, if 
sampling variance be accepted as a criterion of accuracy of estimation, the use of the median 
instead of the mean sacrifices about 36 observations in 100. It is not possible to economise 
by using a different estimator than the mean. 

Other things being equal, the estimator with the greater efficiency is undoubtedly 
the one to use. But sometimes other, things are not equal. It may, and does, happen 
that a most-efficient estimate derived from t x is more troublesome to calculate than an 
alternative t 2 . The extra labour involved in calculation may be greater than the saving 
in dealing with a smaller sample number, particularly if there are plenty of further 
observations to hand. 


Example 17.5 

Consider the estimation of the standard deviation of a normal population with variance 
a 2 and unknown mean. Two possible estimators are the standard deviation of the sample 
(or the square-root of E (x — x) 2 /(n — 1) if it is desired to use an unbiassed estimator) 
and the mean deviation of the sample multiplied by y/(n/2) (cf. 5.20). The latter is 
easier to calculate, as a rule, and if we have plenty of observations (as, for example, if we 
are finding the standard deviation of a set of barometric records and the addition of further 
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members to the sample is merely a matter of turning up more records) it may be wortli 
while estimating from the mean-deviation rather than from the standard deviation. 

In normal samples the variance of the mean-deviation is (9.13)— 

— — ■ - o 2 (~ + ^/{n(n — 2)} — n + arc sin —- — \ ~ — (l — . (17.17) 

n n 2 \2 v 1 /J n — 1/ n \ n) 

The variance of the estimator from the mean deviation is then approximately 


a 2 /n — 2\ 

» \ 2 y 


(17.18) 


Now the variance of the standard deviation is (9.22) a 2 /2n> and we shall see later that it 
is a most-efficient estimator. Thus the efficiency of the first estimator is 


0 




0-876. 


The accuracy of the estimate from the mean deviation of a sample of 1000 is then about 
the same as that from the standard deviation of a sample of 876. If it is easier to calculate 
the m.d. of 1000 observations than the s.d. of 876 and there is no shortage of observations, 
it may be more convenient to use the former. 

It has to be remembered, nevertheless, that in adopting such a procedure we are 
deliberately wasting information. By taking greater pains we could improve the efficiency 
of our estimate from 0-876 to unity, or by about 14 per cent, of the former value. 


Sufficient Estimators 

17.14. The comparison of the efficiencies of two estimators, as measured by their 
variances, may be made for any n , but the absolute efficiency as defined in 17.12 by relation 
to a most-efficient estimator is in the main a limiting property. We shall see below (17.36) 
that the definition may be extended to small samples and to non-normal variation, but 
most-efficient estimators for finite n do not exist so frequently in statistical practice 
as in the limiting case of large samples. Sometimes, however, there are estimators which 
may be regarded as the 44 best ” for samples of any size, and we proceed to consider 
them. - 

Before doing so, we prove that, in the limit, all most-efficient estiniators tend jio 
equivalence. 

More precisely, if two most-efficient estimators t x and t 2 tend in the limit to be dis¬ 
tributed in the bivariate form 


dF oc exp - ®>* “ 2 P <*i - °) <*• - 0 ) + (*« “ 

then the correlation p = 1. Here v is the variance of each estimator. 

Consider the estimator , 


* u \ — i + ^a)* - 1 

Clearly u x is consistent since t x and t 2 are both so. Putting ^ 

u * = i (^i — ^a) 

we have, for the joint distribution of u x and u % , 

dF oc exp *) { 2 ^ — ^ (Ul - 0) 4 + 2 (1 + p) m|}J 


(17.19) 


(17.20) 
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I'hus u, is distributed independently of tt 1 and 6 and we have 

v (1 — p 2 ) 1 + p 

w ”*" 2 (i^r— 2 V ' ■ 

Now ti is a most-efficient estimator and hence 

1 + P 


. (17.21) 


v — var > var t x = v 


giving 


1 + P 


> 1. 


. (17.22) 


But p cannot be greater than unity and hence p = 1, which proves the theorem. 


17.15. Consider once again the estimation of 6 in the normal population (17.1). 
The joint distribution of the sample is given by 

dF = ~^ ex P { “ \Y (*, - 0)4 dx t . . . dx. n . . (17.23) 

(2jr)* 1 /—i J 

We have the familiar result 

n 

(x j — 0) 2 — E (x — x) 2 + n (x — 0) 2 , 

7?l 

and hence 

dF — —-— exp ? (x — 0) 2 1 exp {— J 2* (# — x) 2 } dx Y . . . dx n . (17.24) 

(2ji)^ *• * 

Thus the frequency function of the distribution of x’s (which is equivalent to the likelihood 
function) can be factorised into two parts, one depending on x and 0, the other depending 
Mi the x’b but not on 0. 

The quantity ^is then said to be a sufficient estimator of 0 ; and generally, if the 
likelihood function is expressible in the form (as a product of two frequency functions)— 

L (xij . . . x ni 0) = Lx 0) L% (Xif . . . # n ), . . (17.25) 

where Lx does not contain the x’a otherwise than in the form t and L 2 is independent of 0, 
l is said to be a sufficient estimator of 0. 


17.16. As so defined, a sufficient estimator, if it exists at all, is unique except that 
if t obeys the relation (17.25) any function of t will obviously also obey the same relation. 
Prom all such functions we must evidently choose one which gives a consistent estimator 
and can sometimes, as in the example of the previous section, find the estimator which is 
unbiassed. Apart from such ambiguities, which offer no difficulties in practice, the property 
of uniqueness holds. For if t x and t 2 were two different sufficient statistics, not functionally 
related, we should have— 


and hence 


■^i (^i> A) -L% { X l9 


x n ) == Mx {t 2f 0) M % (x lf 


*n)> 


Li (tu 0 ) 


M 2 

l; 


(17.20) 


Mx (t 2i 0) 

Since the expression on the right does not contain 0, L x must be a factor of M x and more¬ 
over the quotient must be a constant; for if it were a function of the x*a that function 
would have been assimilated to L 2 or M 2 . 



SUFFICIENT ESTIMATORS 


» 

Hence 

I'i (^i> 0) ^ k M x (t 2i 0)> 

and this cannot be so unless t x and t 2 are functionally related. 


17.17. The fundamental property of sufficient estimators derives from the following 
theorem :— 

If t x is sufficient and t 2 is any other estimator of 0 (not a function of /i) the joint dis¬ 
tribution of t x and t 2 may be put in the form 

dF = /i (t\,0)f 2 (t 2i t x ) dt x dt 2 , .... (17.27) 

where/ 2 does not contain 0. Conversely, if (17.27) holds for every t 2 then t x is sufficient. 

Before proving this result let us notice its importance. From (17.27) it follows that 
for any given t x the distribution of t 2 is equal to f 2 (t 2 , t x ) dt 2i i.e. is independent of 0. Con¬ 
sequently, if we know t l9 the probability of any range of values of t 2 is the same for all 0. 
The distribution of t 2 given t l9 therefore, can throw no light whatever on 0. Thus, a know¬ 
ledge of t x gives all the information that the sample can supply about 0 and no other 
estimator can add anything to it. We are clearly justified in such circumstances in 
describing a sufficient estimator as the “ best ”. 

Now as to the theorem itself. The direct part is easily proved. In fact, we have from 
(17.25)— 


Tj {x X9 . . . x tv 0 ) dx x . . . dx n — Ld X (#i, 0) Jj 2 (x X9 • 


Make the transformation 

Vi =~ t , (x l9 . . . x n Y 
2/2 = t 2 (x l9 . . . x n ) 

2/3 = 

Vn^Xn 


The element of frequency becomes 

(^u 0) (#1» 


x n) 


d (X u X 2 ) 

d (t l9 t 2 ) 


dy x 


. x H ) dx x . . . dx n . 


. (17.28) 


• • dy n . . (17.29) 


where the £’s and x’s are to be expressed in terms of the y’ s. We have excluded the case 
when t 2 is functionally related to t l9 and hence the Jacobian d (x l9 x 2 ) /d(t X9 1 2 ) does not 
vanish identically. The frequency element of y x and y 2 is then obtained from (17.29) by 
integrating out the other variables. Since y x and y 2 are equal respectively to t x and t 2 
this process will leave unchanged the function L x (t l9 0) and reduce the other part to a 
function of t x and t 2 , say f 2 (t x , t 2 ). Writing f x for L x we then have 

dF — fi (t X9 0) fz (tu tz) dt x dt 2i 

as stated in the theorem. 

The converse is a little more difficult. Let t x be sufficient and make the transformation 
y x “ t l9 y 2 = x 29 etc. The joint distribution of sample values becomes 

L (x t , . . . x n ) = L' (t u y t , ... y n )\ pi j . . . (17.30) 

i OX x I 


Since t x is independent of 0, so is dt x /dx x . Hence, if the distribution of t x is f(t x ) dt l9 L ' may 

nmi tt.An 

f{t 1 )L"(t l ,y . y n ) .(17.31) 

and the converse will be established if we can show that L" does not contain 6. This we 
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do by demonstrating that if there are values y' 2 . . . y‘ n for which U assumes different 
values for different values of 0 then the joint distribution of t 1 and t 2 cannot be independent 
of 0, which contradicts our hypothesis. 

Suppose, then, that for two values of 0, say 0 X and 0 a , 

L" (tu y» • ■ • yn)e x ~ L” (t lf y 2 , . I. . y n )d t 4“ 2a, . • (17.32) 

where a is not zero. Consider a new statistic t a defined by 

ti = (y, - y)Y • N- (17-33) 

2 

Assuming that L* is continuous in the y' s, we may determine a value of t a , say t' Ay such that 
L" (t u y 2 , . . . y n ) 0i > L” (t lf y 2 , . . . y n ) 0t + a . . (17.34) 

everywhere inside the range of values bounded by 

t? = Z(y - y') 2 ' 

Then for any fixed t x the total frequency inside this range is obtained by integrating 1/ 
over the appropriate values, and we shall find, in virtue of (17.34), 

A>/e .(17.35) 

the /’s referring to total frequencies. 

But if the joint distribution of t x and t 2 is 


we have for the frequencies /, 




and hence 


dF = A (t x , t 2 ) e dt x dt 2 

fo t — I A (t Xy t 2 )o l dt 2 

Jo 

fe t — f ^ (^i> ^2)0, 

Jo 


f (^i> ^2)0, — ^ (^11 ^2)0,} dt 2 > 0 , 

J 0 


so that the joint distribution cannot be independent of 0. 

The above demonstration relates to the case when the frequency functions are con¬ 
tinuous. In the discontinuous case the argument simplifies and we leave it to the reader 
to supply the proof. 


to supply tJ 
J" 17 . 18 . 


We now prove an important further result to the effect that a sufficient 
estimator is most-efficient, provided that a most-efficient estimator exist s. We assume 
tKat the joint distribution of the sufficient estimator t x and any other estimator t 2 tends 
to normality for large n, say in the form 


dF oc exp ["- ■— 1 - Uh.- 0 )* _ 

P L 2(1— p 2 ) \ Vl 


0) (t t - Q) 




1 ti dt x dt t . (17.36) 

V(v x v t ) 

where v x and v t are the variances of t x and £, respectively. Since f x is sufficient, the dis¬ 
tribution of given i x does not contain 0. Now the distribution of t x is 


dF oc exp j- i J dt t 


(17.37) 





SUFFICIENT ESTIMATORS 


11 


and hence that of t 2 given t x is 


d F oc exp T_' — / - »<'• 0)1 

P L 2(1 ~p‘) 1 e, V(e.R 

dF oc exp -- 1 -- - 

L 2 (1 - p 2 ) \ Vti! 

If this is not to involve 0 we must have 


- 0) (t, - 0) ^ (< 4 - 0)*\ j. - 0)H 


:) 


+ 


-on 

»2 J 


+ I 


which reduces to 


0 ) ... (<*_- 0)1 *■ 

Vt»! 


d<. 


«i J 

. (17.38) w 


/> = /— — where E is the efficiency of t 2 . . (17.39) , 

\ V 2 


Since p < 1 it follows that v ± < v 2 , i.e. t L lias a smaller variance than any other estimator. 
Consequently, if there exists a most-efficient statistic, £, itself is most-efficient. 

17.19. The criterion of sufficiency is not a limiting property. A sufficient estimator 
is best for any sample size since it gives all the information about 0 that the sample can 
give"; and it is most-efficient For largesamples. If we could always find a sufficient 
■eStiirurtTjf otrr Problem "would be solved, but unfortunately sufficiency is the exception 
rather than the rule. , 


Example 17,6 

The frequency element of a sample of n from the population 




dF « exp 

a V( 2 n) 


1 


(x 


-«)*\ 

<j 2 J 


dx 


can be put in the form 


dF = —*£- exp 
ay/{in) 


n (x m) 2 


n—l 


\ n * 

[ n—l 

J (2<r*F r{ 




e s n ~ 3 dx ds 2 


(Cf. Example 10.5, vol. I, p. 238.) 

If we know cr, then, as we have already seen, x is sufficient for m, But if we know 
m , s is not sufficient for a. In fact, the factorisation in the above equation requires the 
appearance of a in the element relating to x, and we cannot separate a factor containing 
s and a alone or the remaining variables alone. 

This is what we might expect. If we know the real mean m there is little point in 
preferring the sample variance 

« 2 = - 2 '(*- x ) 2 


to the second moment 


n 


1 


s'* Z(x~m)* 

n 


as an estimator of the parent variance. The distribution of s r is given by 

dF = 


ft 

712 


(2<7*)if 


ns ' 9 

(s’) n ~‘ l da' 2 
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‘and this embodies the whole of the frequency element of the sample, apart from differentials 
in the other variables. Thus s' is sufficient for a. 

17.20. This completes the first stage of our inquiry. The criteria of consistence,, 
efficiency and sufficiency provide standards which we shall look "for m * 1 good J * estimator s. 
Of themselves, however, they do not provide any systematic way of deriving estimators 
which obey them. We shall now consider various methods which have been proposed for 
providing estimators and examine how far they conform to our criteria. The most 
important method is that of maximum likelihood, which will occupy the remainder of this 
chapter. In the next chapter we shall consider four others, the jn et.h nH nf jmi 
variance the method of minimum j£ 2 , the method of least squares, and the method of 
inverse probability. 


Maximum Likelihood 

17.21. If the frequency function of the parent population is f (x, 0), the likelihood 
function of a sample of n is, by definition, 


L=f(x l ,0)f(x t ,0) . . . f(x n , 0) . . . . (17.40) 

The Principle of Maximum (or Maximal) Likelihood then states that if there exists a statistic 
t — t (x u . . . x n ) which maximises L for variations of 0, then t is to be taken as an 
estimator of 0. In short, t is the solution (if any) of 




d 2 L 
d0 2 


< 0 . 


. (17.41) 


Since L is positive, the first equation is equivalent to 


1 dL 
L dO 


-ilogi-0, 


. (17.42) 


a form which is frequently more convenient. 

There is one small point to notice here. In our usual convention, if a frequency 
function has a finite range, we regard it as defined from — oo to + oo but as zero outside 
that range. In this chapter we shall occasionally meet the reciprocal of/, which is undefined 
J) for zero /. Unless the contrary is specified we shall suppose that where / is zero 1// is also 
* to be regarded as zero. This will enable us to continue to regard the range as infinite, but 
some care is necessary where / is assumed everywhere continuous, for discontinuities may 
appear in / and l/f at the terminals of the finite range. The point becomes important 
when we try to make certain existence theorems rigorous. 


17.22. In sections 7.27 to 7.31 we touched on the principle of maximum likelihood 
from the point of view of statistical logic. We pointed out that its adoption required a 
new postulate in the theory of inference, but referred to the fact that the principle was 
recommended by the statistical properties of the estimators to which it leads. We now 
proceed to prove a series of theorems about these estimators, from which it will be seen 
that the posterior recommendation, so to speak, is very strong. In fact, maximum 
l ikelihood estimators are c onsistent^ tend to norm ality f o r l arge n, have*mmmS5]^ffl^S^“ 
iiTtW limit at least,* ai^ pro vide siSficient statistics Jdiere su ch ’exfi t. 

17.23. The reader may feel convinced intuitively that maximum likelihood estimators 
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'are consistent, in which case he can pass to the next section. We shall now prove the 
result formally. 

/ (a) If the frequency function / (x, 0) is continuous in x throughout its range, and 

y (b) if / (x, 6) is continuous and monotonic in 0 in some 0-interval containing the true 
value of 0, say 0 O , and for all x in some ^-interval, 
then the maximum likelihood estimator of 0, say t, is consistent. 

Our proof will also cover the case of discontinuous variates which can be reduced to 
the continuous case by replacing each value by an interval in which the frequency is 
uniformly distributed. 

We first eliminate an inconvenience due to the infinitude of the range. In fact, if the 
range is infinite we make the variate transformation x = tan y. The conditions (a) and (6) 
remain true of y, and the maximum likelihood estimator in x transforms to that in y. We 
may therefore take the range as finite. 

The next step is to reduce the ease to one of grouped frequencies by dividing the range 
into m intervals, the width of the jth interval being lj. (We shall decide on the actual 
values of the Z’s below.) Writing 

f] = 0)dx, .(17.43) 


we have, in virtue of the continuity of / in x, that fj/lj differs as little as we please from 
/ (xp 0). Then if // is the likelihood of the grouped data, proportional to 


(f(f 


(17.44) 


where is the number of observations in the jth interval, we have, except for constants, 


hi. ni 

log L' = n i log ft log l l 


(17.45) 


and this will differ arbitrarily little from the logarithm of the true likelihood 


it 

lo g L=£ log/ (xj, 0), . 


(17.46) 


provided that w r e take m large enough and the i’s in consequence small enough. 

Hence we see that if t is the estimator which maximises L and t' that which maximises 
L\ in virtue of hypothesis (6) that L and U are continuous in 0, t and t' will differ as little 
as we please for any given values of the x’s and that uniformly. We may therefore prove 
our theorem for the finite number of variables Uj and infer its truth for the continuous 
case by proceeding to the limit. 

In different samples the will vary, subject only to the condition that Z (nj) = n. 
Let us choose the ranges lj such that (0 O ) = 1/m for all j, that is to say, such that the 
frequencies in all intervals are equal when 0 takes its true value 0 O . Consider the likelihood 
function 


f/r- 

K = Zj n i Iog 2 


(17.47) 


where the z’s are subject only to the condition 


. (17.48) 
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*We consider three values of K defined by particular values of the z’ s. 

(a) When z i = n^/n, K ia a maximum, say K R . For we have 

6K = Z n l d Zj 
z i 

0 = Z dz } , 

and hence 

n x _n, _ _ Z (n) _ 

Zi z 2 2* (z) 

(b) When z } (0 o ) = 1/m, K is, say, If*. 

(c) When the estimator assumes the value, say, corresponding to the nfa, and 
hence z t =f f (^), A r is a maximum, say K Zi among the particular set of values of 0 for 
which Zj =fj (0); for this is our definition of t'. 

We have at once that 

K r > K E > K m .(17.49) 

Now, as the sample increases, the observed Uj/n converge in probability to their 
theoretical values fj (0 O ) = 1/m. Since K is continuous in the z' s, K u - K M will converge 
to zero in probability and, from (17.49), so will K R — K z . 

Now we show that this entails that each of 

converges to zero in probability. In fact, since (0 O ) — —-| does so, it will be enough 

71 

to prove that the same holds for 

' j.( l7 - 50 > 

71 

Let K x be the maximum of K for some fixed z x . Then K H > A, and 

k b - k m >K x - k m . 

Hence K x — K v converges to zero. The maximum K x is readily seen to be given by 


- n i (l _ 


2 , . . m 


. (17.51) 


wi 

ATi = n x log z, + (n — n,) {log (1 — z x ) — log (n — n x )} + n } log n } . (17.52) 

Now z x is a double-valued function of K x , continuous and having its two values equal 

dK 

for K x = K r \ for K x is continuous in z x from 0 to 1 (not inclusive), and -r— 11 changes sign 

oz x 

only for z x = n x /n, where K x = K R . It follows that when K R ~ K x is small, so is 
z x — n x /n. If the other z’s are not given by (17.51) K R — K is smaller still. 

A similar argument applies for any j, and hence I Zj — — ! converges to zero in proba- 

i w i 

bility when K R — K does so. Taking z i = fj (t 0 ) and remembering that in this case K 
becomes K Z) we reach (17.50). 

Finally, by hypotheses (a) and (b) at least some of the fj (0) have continuous inverse 
functions expressing 0 in terms of the functions /, and hence by taking 

l/i («'o) -Si (Go) I 
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as small as we please, we may make t' Q — 0 O as small as we please. Consequently V con-* 
verges to 0 O in probability and is consistent. 


17.24. The reader may find the foregoing proof easier to follow if we express its 
main points in geometrical terminology. 

Consider the m proportions njn as the co-ordinates of a point in a space of m 
dimensions. The theoretical frequencies 
f i (0 O ) = 1/m define a point, say M y in 
this space, and the sample point R, cor¬ 
responding to an observed set of n/ s, may 
be regarded as varying round the “ theo¬ 
retical ” point M. The quantities z are 
the co-ordinates of any point in the hyper¬ 
plane £ (z) — 1, which contains M and R. 

(See Eig. 17.1.) 

Now, for any sample point R the 
maximum likelihood estimator t' assumes 
a value t 0 which in general differs from 
0 O . This value defines m quantities f i (^) 
which determine a point Z. This also 
lies in the hyperplane since the sum of 

the frequencies is unity. Thus the points R determine a set of points Z which all lie on 
the curve defined for variations in 0 by 

*,«/,(#).(17.53) 



Fig. 17.1. 


Since 0 = 0 O is a possible value of 0, the point M lies on this curve ; R in general does 
not. 

What we have shown in analytical form is that the function K , which is the logarithm 
of a likelihood function defined for any point on the hyperplane, has a maximum at R 
and a maximum on the curve itself at Z . As the sample size increases, R is as near as 
we like to M (in the sense of convergence in probability, that is to say, that as high a pro¬ 
portion of points R as we like are as near as we like to M ). This involves that Z also is as 
near as we like to M. This in turn involves that the parameter-value t {) corresponding to 
Z is as -doscTas we like to I) {} for as high a proportion of the possible points Z as we like, 
which is our theorem. 


17.25. We now prove a second fundamental property of maximum likelihood 
- estimators, namely that they tend to normality for large n. More precisely, 

(a) If condition (a) at the beginning of 17.23 is satisfied ; and if (more stringently 
than condition (6) of that section) (c) in a 0-interval containing the true value 0 O , 

-f- is continuous in 0 for every x, x l ~~ approaches a continuous function of 0 as x 
oO vv 

tends to infinity, and does not vanish in some interval, 

then the maximum likelihood estimator t tends to normality for large n. The condition 

Sf df 

as to ensures that in the transformation to finite range ~ remains continuous in 6 

ov vu 

throughout that range. 
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We recall that if 



. (17.54) 


that is, if the f’s are the deviations of the actual proportional frequencies nrfn from the 
“ expected ” frequencies 1/m, the distribution of the f!s in the limit will be normal and their 
distribution spherically symmetric. Consider again the orthogonal space of the previous 
section. The sample points are distributed about the point if in a symmetrical form which 
tends to normality. If we choose a set of orthogonal axes in the hyperplane, the projection 
of the sample points on any axis is in the limit distributed normally with variance 1 /mn . 

In the neighbourhood of if the curve (17.53) approaches its tangent line as n becomes 
larger, and we therefore have, if s is the distance along the tangent from M, 


« 2 = (0 -0.) 2 ^{|w} 2 > .... (17.55) 

as follows from (17.53). (The tangent exists in virtue of our hypothesis as to the differential 
coefficients of / in 0.) 

Now consider the point Z on the curve corresponding to the sample point R. We 
know that at Z the function 

K = E n t log (zj + .(17.56) 

where we now measure z from M, is a maximum for variations in z such that Z lies on 
the curve. R is determined by finding the hypersurface (17.56) tangent to the hyper¬ 
plane E (zj) — 0, for at that point dK/dZj is zero. We know that the co-ordinates of 
this point are z i = n^/n — 1/m and that R is the point of tangency. K R as defined in 
17.23 is the value of K at R , and K z is that at Z. We then have, by Taylor’s theorem, 


*- x « + £(• •<«•"> 


to the second order of smalt quantities in dz. From (17.56) we see that 


dK 

— = n 

OZjt 


d 2 K 
dz, dz k 


= 0 , 


» 2 , 

n., 


j 
3 


"*J 


(17.58) 

(17.59) 


Hence 

_ n V— 'i* 

n i 

Now Z ( dZj) = 0, for the variation takes place in the hyperplane. Hence, for given JZ, 


K z = K R + n Z (6z^ 

4 71 * 


(17.60) 


Z is the point for which E —— • is a minimum. As n tends to infinity the nf s tend to 


n, 


equality, and hence Z is the point on the curve which is nearest to R. Thus R is, in the 
limit, projected orthogonally on to the curve, that is to say, in the limit, on the tangent 
line. 

Now we know that these points are distributed normally with variance 1/mn and 
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this proves the theorem. We may also evaluate the variance of the maximum likelihood 
estimator; for 

var s 

var* = - - 




mn s \d§f* 


. (17.61) 


and since i' approaches t for fine grouping we have also, remembering that 1/m (0.), 


1 _ f 00 /3/\ 2 dx 

ar t J .» \<>0) f 


. (17.02) 


where 0 is to be put equal to 0„ on the right. 

It may be remarked that condition ( c) at the beginning of the section prevents the 

vanishing of ~ which might render the expression (17.61) nugatory. 
dv 

17.26. We have, then, under the afore-mentioned conditions, 


jl y. 

var t \ BO ) 

df 

If the range is independent of 0, or if / and ~ vanish at any extremity of the range which 

depends on 0, we have the alternative form— 

— 1 = — n E (—).(17.63) 

var t \ dO 2 J 

In fact, since f fdx = 1 where a, b are the limits of the range and may contain 0, we 
J a 

have * 

-f/OT*- 


Differentiating again, we have 


-r.W'* 1 m* - ws- • <™> 

Again, if the range is independent of 0 or if vanishes at the extremity, the last two 

* Tho operation of differentiating under the integral sign requires certain conditions as to uniform 
convergence, even when tho limits are independent of 0. To avoid prolixity we shall always assume 
that the conditions hold unless the contrary is stated. The point gives rise to no statistical difficulty 
but is troublesome when one is aiming at complete mathematical rigour. 
a.s. —n c 
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< 

terms oh the right in (17.64) are zero, and we have (reverting to our usual convention as 
to limits) 

and the result follows from (17.62). 


17.27. We now prove a third fundamental property concerning the efficiency of 
maximum likelihood estimates. 

If i be any estimator of 0, the range of f (x, 0) is independent of 0, and in large samples 
< is distributed normally about mean 6 0 (the true value of 0) with variance v ; then 

1 A ’ J* (P~~\fdx, with 0=0.; 


cannot exceed 


nv 


36 ) 


and hence/ if a maximum likelihood estimator exists, it is most-efficient in the class of 
such estimators. 

By hypothesis, we have in the limit for the frequency function of t, 


0 = 


1 


exp 


and hence 


\Z(2nv) 
d 2 log 0 




30 * 


1 

~ J 

V 


(17.66) 


. (17.66) 


where, for convenience, we drop the suffix of 0 until the end of the proof. We then have 

l-f" 


Now consider 


-r 


® \30 ) 




.(17.67) 

.(17.68) 

x n conditioned by t — constant. 


. (17.69) 


as a random variable over the possible values x x . 

Since the frequency of u is L, we have 

Z(L) iriLyr 

with summation (or integration) over the range of x’s. Now 0 is the frequency of all 
samples having a constant t, and hence 

0 = Z(L). 

Hence 


var 

0 0 2 


Now var u cannot be negative and 0 is not negative, and hence 


. (17.70) 


. (17.71) 
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But 



d0 

dd' 


and hence, substituting in (17.71) and integrating over all t, we have 





(17.72) 


Now Z is carried out over all x for constant t and the integration over all t, so that the two 
summations together are equivalent to summation over the x’s without restriction. Hence 



which establishes the result, since the expression on the right is the reciprocal of the variance 
of the maximum likelihood estimator, if it exists. 


17.28. The fourth fundamental theorem of maximum likelihood estimators is as 
follows :— 

*^Tf a sufficient estimator exists, it is a function of the maximum likelihood estimator. 
In fact, the likelihood can then be put in the form 

Tj — L x ( t , ()) 1j» (#i • • • 
where L 2 does not contain 0. Hence 

M log L “ sti Jog L ‘ 

= yj (i 0 , t), a function of 0 and t only. . (17.73) 

Hence, for fixed t , ^ log L is constant, and it follows from the previous section that the 
o(J 

variance of t is equal to the variance of a most-efficient estimator (for var u is then zero 
for fixed t and the inequality (17.72) becomes an equality). Hence the sufiicient estimator 
is most-efficient, confirming the result of 17.18. 

It follows from (17.73) that the maximum likelihood estimator is given by 

V(M) = 0,.(17.74) 

which proves the theorem. 

Conversely, if t is such that (17.73) is true, it must be sufficient; for then we have 

log L = C + J tp (0, t) dO , 

where C does not depend on 0 and the likelihood is of the requisite form. 

Example 17.7 

Consider the estimation of the parameter m in the population 

dF - rv@») “ p {" s (^t 5 )’} -»<*<» 
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where a is known. The frequency function is easily seen to obey the conditions relating 
to maximum likelihood estimators. We have 


1 - T 

log I. = - n\o%ay/(2n) - — JT (x, - m) 2 , 
and hence the maximum likelihood estimator is the root of 

L 


A log I, = 1 Z(x - m) = 0, 


giving 


m = ~ 27 (x) = x . 
n 


It is frequently convenient to denote the estimator of a parameter by writing a cir¬ 
cumflex accent over it in this way. 

In this case the sample mean is the maximum likelihood estimator. It is therefore 
most-efficient and no other estimator can have a smaller variance in the limit. For the 
variance we have, from (17.63), 



giving the familiar result— 



n 


a 2 

var x = —. 


n 


This, as it happens, is true for any n. The estimator is also sufficient, for 

~ log L = \ (nx — nm) 
om o 6 

— a function of m and x only. 

The condition that a 2 is known is to be noted. Complications arise when two parameters 
are estimated simultaneously, as we shall see presently. . 


Example 17.8 

Consider the estimation of 6 in the Type III distribution 


dF l^ r/t dx, 
F(p) 0" ’ 


0 <x < oo 


where p is known. 
We have 


log / = (p - 1) log X - - — log F(p) —p log 0 


and hence, dropping terms independent of 0, 


1 
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The equation of maximum likelihood is then 

1 




giving 



s = 

_Z(x) 

_ X 



np 

~ p' 

by (17.63), 

as 



II 

■ > 

“”J 

•00 J 

TO \ 

- 2 ? + P\fdx 
03 ^ 0 2 /* / 


— n ■ 

U 

2 'p\. 


[0 2 

0 2 / ’ 

var 0 — 

0 2 

- - -» 




np 




where 0 is the true value of the parameter. We could also have obtained this result directly 
(and again it happens to be true for all n). From Example 10.11 (vol. I, p. 244) we have 
for the distribution of x/p = 0, 

nj)O y 


»<■ (-*?) 


dF = n n *> _ 

r (up) 

from which the first two moments about the origin are 


de, 


n ■ H P 'I" 1 m 

t>\ = 0, /** r - 0 2 , 


np 


giving 


var 0 — p t — 


np 


We note that the likelihood function may be put in the form 

log L — (j) — 1) E log x — n log r (p) — — np log 0, 

0 

from which it is evident that 0 is sufficient. 


Example 17.9 

Consider the estimation of the parameter A in the Poisson distribution whose general 
A* 

term is e~“ A —r. 

x\ 

In this case the likelihood function is discontinuous and we have 

e -nA tfix) 

Jj — 


Hence 


! • . . x n r 

■^ lo gL = - n + 


dX 


X' 


giving X — x, the sample mean. 
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For the variance we have 




n 

A 


var A = ~, a familiar result. 
n 


It is easy to see in this case also that A is sufficient. 


Example 17.10 

What is the most general form of distribution, differentiable in 0, for which the sample- 
mean is the maximum likelihood estimator ? 

We are given that a solution of 

» h « A - 2 : GS )- 0 

is 0 = - E (x) 

n 

or E (x — 0) — 0. 


This is true for all x and 0, and hence 

1 £ = 

/ dO 


(x - 0) K, 


d 2 y> 


where K is independent of x but may be dependent on 0, say equal to Then, 

integrating, 

d 2 xp 


log / = | dO (x - 0) 


30 2 


= (* - e) + y + c (x), 


where £ ( x) is an arbitrary function of Hence 

/ = * exp |(a; - 0) || + ? (0) + C (*) j, 
which is the most general form of rfa 

if = U*) = -i* 2 

the form becomes the normal distribution 

f — k exp {— £ (* — 0)*}. 


Successive Approximations to Efficient Estimators 

^ 17.29. In the examples we have just given, the solution of the maximum likelihood 
.equation was carried out without difficulty. It frequently happens, however, that the 
equation is by no means so easy to solve explicitly, though it can sometimes bei solved: 
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for particular values of x by iterative methods. Another possibility is to compute ad 
inefficient estimator and correct it by an extra term, which can be obtained as follows :— 
Let t! be an inefficient estimator and t a most-efficient estimator. Let 

<5 - f - t. 

Then var b = var V -|- var t — 2 cov (t' f t). . . . (17.75) 

Remembering that if E is the efficiency of t\ 

var t ~ E var V 
cov (t\ t) 


we have 


(var t var $')* 
var b 


y/E (see (17.39) ) ; 
1 - E 


E 


var t. 


. (17.76) 


If then V is 44 nearly ” efficient, that is, if 1 — E is small, the average value of b = t' — t 
will be small. 

If the maximum likelihood equation is 


consider 




t" = t ' + var t | 


We have 


/a log l\ 

V*0 ~) u „r 

Cm Cm 1 '),., f ■ 11 ( a ’»* L )-< +term ' ot higher o ”‘" 


(17.77) 


- (*' - t) 

/a 2 log l\ 

\ ao* ) 0 , t ' 

For large n , approximately 

__ _L = 


var t \ dO 2 /»,< 

and hence, approximately, 

/a log l\ _t-e 


\ dO /a-,- var<’ 

Hence 



==<' + «-<' 

= t. 


. (17.78) 


and t" is an efficient estimator to a better order of approximation. This process may be 
repeated and, rather like Newton’s successive approximation to the roots of an equation, 
may be expected to improve the efficiency of an estimator. 

Example 17.11 

Suppose we have to estimate 0, the parameter in the Cauchy population 

1 dx 
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We have already seen that the sample-mean is not a satisfactory estimate and that for 
large samples the median is consistent and has variance jj*/4 n. 

The equation of maximum likelihood gives 

_31og L_ ( 2 (x — 0) 1 

36 \l'+(*-»)«/ 

This is a (2n —1 )-ic in 0 and correspondingly difficult to solve. We may, however, 
find the variance of the solution § from (17.63). We have 


an* J 


2 (x — 0) 2 — 2 
0 {1 +(x -- 0)*}® 
(x 2 — 1) dx 
(1 


Hence 


var 0 = 


The median, therefore, has an efficiency of — = 0-8, and we expect that 




i-j 


n \ 1 + (x — 0*J ’ 

where t' denotes the median, will be an improved estimator. 


Most General Form of Distributions possessing Sufficient Estimators 
17.30. If t is sufficient for 0 we have 

- K . . 


(17.79) 


where K is some function of t and 0. Regarding this as an equation in t we see that it 
remains true for any particular value of 6, say zero. It is then evident that t must be 
expressible in the form 




(17.80) 


where M and k are arbitrary functions. If w = Z k (x) then K is a function of 0 and 
w only, say N (<, w). We have then 

a*log L _ 3N dw n 7 ai\ 


3w dx* 


. (17.81) 


Now the left-hand side is a function of 6 and x, only and to is a function of x, only. Hence 
^ is a function of 6 and Xy only. But it must be symmetrical in the x’s and hence is a 

function of 0 only. Hence, integrating with respect to to, we have 

N(t, w) — top (0) + q (0), 
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where p and q are arbitrary functions of 9. Thus * 

^ (log L) — ~£ {log/ (x y , 0) } = p (9) 27 * (x } ) + q(9) . . (17.82) 

whence A l og / (*, 0) = p (0) k (x) + i q (9), 

giving f(x, 6) = exp {p (9) k (x) + q (9) r (*)}, . . . (17.83) 

where we still write p and q for the integrated functions. 

The expression may also be written 

f(x, 0) = Q (0) R (, x) exp {p (0) k (x)} . . . (17.84) 

or,Mf we simplify the specification of the distribution by writing 0 instead of p (0), 

/ (x) = Q (0) R (x) exp {0 k (x)} .(17.85) 

It will be found that if (17.85) holds, the likelihood function is of the required form for 
the existence of a sufficient estimator, so that the equation is sufficient as well as necessary. 


Distribution of Sufficient Estimators 

17.31. It is remarkable that the distribution of a sufficient estimator can be obtained 
directly from the likelihood function. From (17.85) we have 

log L = n log Q + E log R (x) + 0 Ek (x) 

giving, for the maximum likelihood estimator, 

”H+27*(x)= 0,.(17.86) 

Now, for the characteristic function <f> (x) of w (= £k (x)) we have— 


<M“) = f • * • f e i<ra / (x lt 0) dx! 

J -X J — co 

= || e ikM *fdxy 
= j J Q (9) R (x) e lia+0 > «*» dxj" 

Q(0) \ n 

{Q(0 + ix)j 

Hence the frequency function of w, if existent, is 


/ (x„, 9) dx n 


(17.87) 


Now from (17.86), 



e~t'av> f Qm V 

l Q{0 + *•«)/ 
- (Vl ?9\ 

\Q 90/e_< 


dx. 


= n S ((), say, 

and hence the frequency function of the estimator t is 


/(«) = 


/BS\ p 

2*\ dt) J_«, 


e -ianS(i) 


[ Q(t) Y 
1 Q (t + ia) J 


da .. 


. (17.88) 
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Example, 17.12 

The normal distribution with unit variance may be put in the form 

1 


/ = 


e 


V(2jr) 

Comparing this with (17.85), we see that if 

Q (i 9 ) = e"*** 

1 


—\x % g-JO* 


R(x) 


s-t** 


V(8») 

k (x) = x 

the condition for a sufficient estimator is satisfied. That this is (as we already know) 
the mean x may be confirmed from (17.88). We have 

8(0) = -~logQ = d; 

and hence for the frequency function of the estimator x. 


-1 

r °° . j 

i g-icmx J 

r e -H‘ 

1* da 

2 71 J 

1— 1 

[ e -*(a -A-UlY 

J 

-1 
2n J 

•00 

exp {- 

— 00 

- \nv} — 

i<xn(x 


-A 


n 

2 71 


exp { — \n(x— 0) 2 }. 


Example 17.13 

The Type III distribution considered in Example 17.8 may be put in the slightly 
different form 


dF = 


r 

r(p) 


x p-l e -yx fa. 


0 <a; < oo. 


Regarding p as known and considering y as the parameter under estimate, we see that 
a sufficient estimator exists, because we may write 

Q {y) = TJp) 


R (x) = x v ~ x 
k ( x) — x, 

which throws the distribution into the form (17.85). We have found the estimator and 
its distribution in Example 17.8. 

On the other hand, suppose that y is known and we wish to estimate p. Writing 


Q (P) = 


y p 

r( P ) 


R (x) = e -yx-iogx 
k (x) = log x 


we see that a sufficient estimator for p also exists. 


d 


1 


It is the solution of 
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which does not permit of expression of p as a simple function of the x’a. The sampling 
distribution is not expressible in a simple form. 


Example 17.14 

Consider again the Cauchy distribution 


dV 


1 dx 

71 1 + {x — 0) 2 ’ 


— oo x < oo. 


Evidently this cannot be thrown into the form (17.85) and hence no sufficient estimator 
exists. We have already found (Example 17.11) that there is an efficient estimator. For 
finite n no single estimator can contain all that the sample can tell us about 0. 


Sufficient Estimators when the Range depends on the Parameter 

>/ 17.32. One of the conditions of the theorem of 17.23 and that of 17.27 is that the 
range should be independent of 0. In the contrary case our results, particularly for sufficient 
estimators, require reconsideration. 

Suppose the range of the frequency function is from 0 to 6, where b is fixed. If there 
is a sufficient estimator for 0, say t , the distribution of t and any other estimator is inde¬ 
pendent of 0. Take x l9 the lowest value of the sample, as such other estimator. Then 
if t is fixed the distribution of x x is independent of 0, which is clearly impossible unless in 
fixing t we also fix x u that is to say, t is a function of x v Thus if a sufficient estimator 
exists it must be a function of Xj. 

Similarly if the range is from a to 0, a sufficient estimator for 0 must be a function 
of the largest sample member. 

17.33. If x x or some function of it is sufficient for 0, the lower extremity of the range, 
and x ! is fixed, the probability that any particular sample value x is greater than x x is 
proportional to / {x, 0). This must be independent of 0, since x x is sufficient, and hence 
so is f(x, 0)/f(x l9 0). Thus 

fw-m . (17 - 89) 

and this is the most general form admitting a sufficient estimator. 

It remains true in such circumstances that the smallest member of the sample is 
a maximum likelihood estimator. For the likelihood is 

7 ^ y (*i) • • - gM 
{ h (0) } w ’ 

which is clearly a maximum when h (0) is a minimum. Now since the total frequency is 
unity we have, from (17.89), 

h (0) = j* g (x) dx. . . (17.90) 

0 cannot be greater than x l9 for then such a sample value could not appear. The value 
which minimises h (0) is seen from (17.90) to be that which minimises the range, i.e. x v 

17.34. When both extremes of the range, a and 6 , depend on 0, some further modi¬ 
fication is necessary. Suppose that a is equal to 0 and that b (0) is some strictly decreasing 
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'function of 0. Let X n be the value such that b (X n ) = x n , the greatest member of the 
sample, and let t be the smaller of x t and X n . Then of the inequalities 

t < x u b (t) > x n . . . . (17.91) 

one at least is true. But the first equality implies that t > 0 and the second that 
b(t) <b (0), and either of these two implies the other. Hence both inequalities in (17.91) 
are true, and 

0 <t <Xt <x n <6 (<) <6 (0).(17.92) 

Samples with fixed t then lie in a fixed range, and hence t is sufficient if the frequency 
■ function is of the form (17.89). It would seem that this remains the most general form of 
frequency function admitting a sufficient estimator when both extremes of the range 
depend on 0. 


Example, 17.15 

Consider the rectangular distribution 

dF = -0 <x <0. 

If we take the ordinary likelihood equation we get 

log L = — n log (20) = — 

00 00 0 

For this to vanish 0 must tend to infinity, an obviously nugatory result. In accordance 
with the above discussion we should take as our estimate of 0 the smaller of x x and — x n , 
and this is obviously sufficient, for nothing in the sample can tell us more about the 
terminals of the range than its most extreme members. 


Intrinsic Accuracy 

•</ 17.35. If the sampling distribution of an estimator t is 


dF = 0 ( t , 0) dt 

we define the accuracy of t as 



. (17.93) 


. (17.94) 


It is evidently essentially a positive quantity. We assume, unless the contrary is stated, 
that the range is independent of 0. 

I' is the quantity we have already encountered in (17.67) as the reciprocal of the 
variance of t when it tends to normality in large samples. As in 17.27, we have 


I' <n 


LW 


fdx 


. (17.96) 


<n/, say, where 

dlog/V 


-L( 


00 J 


fdx. 


(17.96) 


Now I is independent of the estimator t *and we may call it the intrinsic accuracy of 
Hie distribution / in regard to 0. It is intrinsic because it depends only on /. It may 
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be termed accuracy because it provides, for large samples at least, a minimum to the* 
variance of possible estimators of 6 . We know from 17.25 that under certain conditions 
the maximum likelihood estimator attains this minimum for large samples. 

17.36. We may now extend the definition of efficiency of an estimator to the case 
of small samples. In fact, the efficiency is the ratio of the accuracy of an estimator to the 
intrinsic accuracy of the distribution for the parameter under estimate. This is easily 
seen to apply to the case of large samples for which efficiency was defined in 17.12, and 
may be applied to finite samples or non-normal sampling variation. For such cases, 
however, it is conceivable that the efficiency might exceed unity. A proof that this is not 
so when the range is independent of 0 is suggested in Exercise 17.12. 


17.37. If the range is independent of 0 we have 



and hence the following three expressions for the intrinsic accuracy are equivalent: 

> : O f ) 

-■w 

' r "(-*r) I 

This equivalence holds if / is zero at the extremes of the range. For we then have 







But if /is not zero at the extremes the equivalence may break down. (Cf. Exercises 17.9 
and 17.11.) 


Amount of Information 

17.38. The quantity nl has been called the amount of information about 0 in the 
sample of w, and I may be called the amount of information per member of the sample. 
The use of “ information ” in this specialised sense has not been universally accepted, 
but some of the properties of I are such as we should require of any measure of information. 

(а) If the parent does not contain 0, I -= 0 so that no sample can tell us anything 
about 0, which must obviously be so. 

(б) Since sufficient estimators contain all the relevant information in the sample 
we expect their accuracy to be nl , and conversely. That this is so may be seen as 
in 17.27 and 17.28. In fact, if t is such that the equality in (17.72) holds, var u — 0 

and for fixed t , ^ constant, irrespective of the form of distribution of t. Log L 

is then of the type required for sufficiency. 
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(c) The sum of the amounts of information in two independent sample-members 
is the amount of information in the pair taken together. For if their joint distribution is 

dF = /i (x, 0) dxf t (y, 0) dy, 
we have for the intrinsic accuracy 

-f 


- -1 j /. * * 

— I -1 ■ ■ 


(17.98) 


which is the property stated. 


of Accuracy 

>/ 17,39. Where no sufficient estimator exists, it follows from ( b ) of the previous para¬ 
graph that no estimator for finite n can contain all the information in the sample. In 
so far as any particular estimator falls short of the ideal we may be said to lose information 
by using it. No estimator can avoid losing something, although of course some may 
lose less than others. 

Presumably the loss will be greater for large samples than for small ones, and will 
be least for maximum likelihood estimators. We may calculate the loss in this case. If 
t is the maximum likelihood estimator of 0, we have, to a first approximation, 


0 log L 0 2 log L 

00 ( ' 00 2 ' 


. (17.99) 


3 log L . 


The variance of in samples for which t is constant is thus the variance of 


3 2 log L 


within the set multiplied by (t — 0) 2 . Now the total loss of information, from 17.27, 
is seen to be varw. = var ^ and hence is equal to the variance of t multiplied 

d 2 log L 

by the total variance of ^2 within sets for which t is constant. This we now evaluate. 

Suppose the distribution is grouped so that the “ expected ” frequency in the jth 
group is my The likelihood is then proportional to m^ 1 w"' . . . and apart from 
constants independent of 0 we have 


We have at once 


log L = Z rij log mj . 

i 

3 log L „m' , , dm 

—-s— = z — n, where m = — . 
30 m 30 

3 2 log L v f (m" m' 2 \ *1 

Sp- " £ {(» --Si)*}- • 

-*(£). 


(17.100) 


(17.101) 


(17.102) 


. (17.103) 
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We shall find it most convenient to regard the n ’s as distributed over the groups first of* 
all without restriction and then subject to two linear constraints expressed by £ (n { ) — n 


and 


3 log L 
d6 


-*(£•)- 


constant. From this viewpoint the ft’s may be regarded as 


distributed in the Poisson form with mean and variance m (not the binomial because we 
are not introducing the restriction that the samples should be of fixed size, except as a 
constraint). 

Now if E {kj rij) is a linear function of the ft’s subject to a linear constraint E (a^ n j) = p , 
its variance is 


r (k 2 m) 


E 2 (ketm) 
E~(moc 2 )’ 


(17.104) 


and a second constraint reduces the variance by a term similar to the second in this expres¬ 
sion. The result may be seen from geometrical considerations. We may write 


E (hi) 
E (aft) 



and 


where the variables —have unit variance and mean x/rn. Consider the different values 

\/m, 

of the ft’s, say s in number, as the co-ordinates in a Euclidean space. The density function 
of the variables is then symmetrical about a point (y/ m \> V m *> ■ • • V m 8 ) to which we 
transfer the origin. The variance of the unconstrained variables is then equal to the 
reciprocal of the distance from the origin to the hyperplane E (lcy/mx) = 1, namely, to 
E(k 2 m). But when the constraint is imposed, the variance becomes proportional to the 
reciprocal of the distance from the origin to the hyperplane in the direction parallel to 
E (cay/mx) = 0 and is hence reduced by the amount 


cos 2 <f> E (k a m), 


where </> is the angle between the planes. This quantity is 


E 2 (ky/m.OLy/m) 
E(k 2 m)E(oi 2 m) 


E (k 2 w), 


which gives us the second term in (17.104). 

Now for the first linear constraint E (n) = constant = ft we have a = 1, and the 
reducing term is (since E (m) — n also) : 

— i E 2 (km). 

ft 

/ft* 

For the second constraint we have a = — and hence the term is 

m 


l' 2 (km') 



Thus the variance of £ (kn) is 

£ (k* m)-±£* (km) - • (17.105) 

/ 
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Now taking 

and remembering that 


var i 


m n ra' 2 

m 

ra 2 

= E 



\m ) 


we see from (17.102) that the loss of information is, for large samples, 


21 — {m" — —V\ , , , iV 

\m \ _m / J _ J. ^. / m 2 \ _ \ m \ m / J 

» \ m / 


! (?) 




(?) 


(17.106) 


By considering the width of the groups as tending to zero we may apply this result 
also to continuous distributions. 


Example 17.16 

In the distribution 


dF = 


1 


dx 


— oo < x < oo 


n l + (x - 0) 2 ’ 

there is no sufficient estimator, as we have seen. Let us consider the loss of information 
consequent upon using the maximum likelihood estimator. 

We may write for our “ expected ” value m 

n dx 

- 0) v 


m = 


Hence 


n 1 + (x 

/m' 2 \ _ p 4 p* dp _n 
\~wT/ +p 2 ) 8 ""2 


4 (p 2 — l) 2 dp 


(1+PT 


7n 

8 


Hence, from (17.106), the loss of information is 

Z-.i + o = . 5 . 

4 2 4 

The intrinsic accuracy of the original distribution is J, so the loss of information is equivalent 
to 2£ observations for large samples. For small samples it will presumably be smaller, 
since it vanishes for samples of one. The loss by use of the maximum likelihood estimator 
is therefore very slight and becomes of diminishing importance as the size of the sample 
increases. 


Ancillary Estimators 

17.40. Where no sufficient estimator exists no single estimator can avoid the loss 
of information ; but we may take an additional function of the variables which, together 
with the maximum likelihood estimator, will give an accuracy tending to unity in large 
samples. By taking a third function we can improve the accuracy still further, and so 
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on. The process is analogous to approximating to the value of a function (the likelihood 
function) by ascertaining its differential coefficients at some particular point of the range. 

In fact, suppose that, in addition to the estimator which gives —for some value 

<70 

of 0 such as t y we also find ^ f° r that value. The variance of —^ over values 

< 70 A ou 


in the neighbourhood of those for which these two are constant is then, to the first 
approximation, the variance of 


A (t - 0)* 


a 3 log l 
' 30 3 ! 


which has ordinarily a mean value and variance of lower order in n 

' d log L 


In particular, if t 

is the maximum likelihood estimator, so that ( ^ ^ ^ 0, the value of ( - ^ ^ 

\ 00 /Or t \ 00 /O^f 

may provide supplementary information which enables us to approximate more closely 
to the likelihood function and hence salvage some of the lost information. Such a quantity 
is accordingly called an ancillary estimator. Cf. 17.29 above. 


Multivariate Distributions with One Parameter 

S 17.41. We now proceed to consider the extension of some of the foregoing results 
in two directions : (a) where there is more than one variate but still only one parameter, 
and (b) where there is more than one parameter to be estimated. 

The former raises no new point of difficulty. To take the bivariate case as an example, 
if the frequency function is f (x, y, 0), the likelihood is 

L=f(x u y l ,0) . . .f(x„,y n ,0) .... (17.107) 
and our maximum likelihood estimator is obtained by maximising L in the usual way. 


Example 17.17 

^To estimate the parameter p in samples of n from 

{-»(!•- ,*><** ~ 

We find 

log L ~ constant ” log (1 - p 2 ) — -- 2 {£ (# 2 ) ~ 2 P £ (xy) I - E (?y 2 ) }, 

2 2 (l p ) 

whence, for - ^ — 0 we have 

dp 

, np • 2 ~ „ 9 t yi {£ (* 2 ) - 2’ (xy) + 2 (?/*) } f l 2 2 (xy) = 0 ; 

1 - p* (1 — p‘)* 1 P 

reducing to the cubic in p y 

n + E(xy) - - l - {2 (x' 1 ) j 2 ( 2 /*) } = 0. 

p(l - P l ) 1 “ P 

It is interesting to note that this does not yield the product-moment of the sample. 
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We have, after a little reduction, 
9 2 log/ _ 1 4- p 2 


(** ~2pxy + y») +, 


4 P 


dp * (1-p 2 ) 2 ~ r ~ a ' 9 ' (1 - p 2 )® ‘ (1 - p 4 ) 2 

Since E (a; 2 ) = E (y 2 ) = 1 and E {xy) = p, we have, for the estimator p. 


xy. 


1 = 1 + p 2 

n var p (1 — p 2 ) 2 


2 (1 + 3p 2 > 


+ 


4p 2 

(1 -P 2 ) 2 ’ 


whence 


var p 


(1 -p 2 ) 2 

(1-P 2 ) 2 

71(1 + p 2 )' 

This is less (and may be considerably less) than the variance of the sample product-moment 
in large samples, (1 — p 2 ) 2 /n. The efficiency of the latter is 1/(1 + p 2 ). 


Simultaneous Estimation of Several Parameters 

17.42. We now turn to the case when the unknown parameters are more than one 
in number. To simplify the exposition we shall consider the case of two parameters 6 X 
and 0 a , but examples not infrequently arise where more than two have to be estimated— 
for instance, in the fitting of certain Pearson curves there are four. To fix the ideas, 
consider the normal distribution 


dF 


1 

0 a V(2^) 


exp < - m {x 


0 t ) 2 j dx, 


00 < X < 00. 


The likelihood function, except for constants, is given by 


log L — n log 0 2 - ~ E (x - 0 t ) 2 . 


. (17.108) 


It is natural to generalise our principle of estimation by looking for estimators which shall 
maximise L for independent simultaneous variations of 0 X and 0 2 , i.e. to require that 

IJ 2r-* .<"•“*> 

In our case this leads to 

2 (x - 00 = 0 




whence for the estimators Q x and 0 2 


j x = - Z (x) — x 
n 


(17.110) 


@l = i2 , (x-f) 2 .(17.111) 

Thus the sample mean and variance are estimates of the population mean and variance. 
We note incidentally that the estimator is biassed. 


17.43. There is one possible source of confusion here which should be removed. 
If we know 0 U then 0 a is given by 

6 a = - 2 (X ~ 0,) 2 ..(17.112) 

n 

which is not the same as (17.111), the sample-mean x having been replaced by the known 
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quantity 0 X . Suppose then we estimate 6 X by x, as we may do whether we know 0 a or not, 
since (17.110) does not contain 0 2 . We may then ask, what is the estimator of 0 a which 
maximises the likelihood for all samples giving the ascertained value of 0 X , namely, x \ 
This is an entirely different question from the one which gave rise to (17.111) and we 
must not be surprised if it has a different answer. The variations of L from sample to 
sample are now considered in a certain sub-population for which x lias a fixed value. 

In our particular case the problem can be solved explicitly. The likelihood function 
can be thrown into the form, with variables x and s — 


L dx ds 


1 IJUl 

0 2 V 


exp 


20 ?, 


(x - 0 ,) 


■} 


/ .s* 2 1 / n.s 2 \ 

o; ex i■ • |l7ll3) 

where s 2 is the sample variance. 

If we maximise the likelihood in this form for simultaneous variations of 0 X and 0 2 
we arrive back at (17.110) and (17.111), as of course we must. But if x has a fixed value, 
the distribution of s becomes of one lower degree of freedom. The likelihood is then 
proportional to the second factor in (17.113), viz. 


v n '1 


"flr 


exp 


(--# 


and for variations of 0 2 this is maximised by 

n 1 


m. - 


n 


"1 n 


'(* I*) 2 * 


(17.114) 


This, it may be noticed, is an unbiassed estimator. 


17.44. The difference between (17.111) and (17.114) is apt to be confusing, for both 
are, in a sense, maximum likelihood estimators. The distinction arises from the fact that 
we are considering the variation of L in two different populations, the first over all samples 
of size n , the second over the more restricted samples subject to the further constraint 
27 (#) — constant. The difference when n is large, of course, is quite unimportant, but 
as a, theoretical matter the point has some interest. 

' Which of the two is employed for practical estimation is a matter of choice. At first 
sight it may strike the reader as objectionable to use (17.114), because x is not known before 
the sample is drawn, and there are obvious dangers in basing an inference on properties 
of the sample which are determined a posteriori. This objection, however, does not lie 
in the present case. We make up our mind beforehand that, whatever x may turn out 
to be, we will make an inference in relation to the sub-population of samples determined 
by it. There is, in fact, no posterior determination of the rule of inference. 

17.45. Possibly without realising it, the reader is already accustomed to make an 
inference of this kind in relation to a sample number. We do not usually determine before¬ 
hand what size the sample must be ; our results (apart from the distinction between small 
and large samples, which is another matter) are true for any n , whatever n may turn out 
to be in practice. In the same way the estimator (17.114) is a maximum likelihood esti¬ 
mator, whatever x may turn out to be, x being a property of the sample, just as n is. 

The fact remains, of course, that (17.111) and (17.114) give different results. Which 
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is the better ? The answer depends on what we require of the estimator. If we wish 
to choose 0j and 0 a so as to maximise their joint likelihood we choose (17.111). If we wish 
to select them so that the likelihood is maximised for 0i and then, for the observed x, is 
maximised for 0 2 , we choose (17.114). 

17.46. It may be shown that, as for the case of one parameter, the likelihood esti¬ 
mators of several parameters are consistent under very general conditions and tend for 
large n to bo distributed in the multivariate normal form. We omit the proof of these results, 
which the reader will probably be willing to accept, and proceed to a generalisation of 
the theorem of 17.26. Thus :— 

(а) If the frequency function / ( x , 0 U 0 2 , . . . 0 p ) is continuous in x\ and 

df 

(б) if in a certain interval containing the true values 0 1O , 0 2O , . . . 0 0 , -~r is 

OVj 

df 

continuous in 0 d for every x , x 2 - approaches a continuous function of 0. for large 

df 

n, and ~ does not vanish in some interval, then 


n cov (6 p 6 k ) = -jp. 


where A is the (Hessian) determinant 


A-<r (££A /*; 

J to \ ) Ojo \ 


(17.115) 


. (17.116) 


and A jk is the minor of the jth row and Hh column. When p = 1 this reduces to the 
case of a single parameter. 

As n tends to infinity the joint distribution of the maximum likelihood estimators 
tends to the form 

f — k exp j- | ±'<j jk (6 f - O io )0 k e M )|. . . . (17.117) 


The theorem will be established if we show that 


„ _f /3iog/\ /aiogA , 


. (17.11 S) 


for then the values of the variances and covariances of the 0’s are as stated in (17.116). 
(Compare 15.12.) 

Make the transformation 

q h = ZA hi (0, - 0 jn ) .(17.119) 

and choose the A ’s so that the exponential of (17.117) becomes 


* v 

"if* 

9jk = 2 A hi A hk . 


. (17.120) 


The q' s are independent normal variates with variance l/n. Hence, from the theorem for 
the case of a single parameter, already proved, we have 


era* 


fdx~l. 


. (17.121) 
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Further, we have 

l 


for if we put 
and 

the expression becomes one half of 
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(17.122) 


9h 


V2 


(u h - u t ) 


( h = + Ul) 


JXC-JTHW}- 


which vanishes since the u’s have the same variance as the q' s. 

Now 

d j°sf\ -y dl 3ff dt h\ ^ - V A a J?8/ 
9Of k, ~ dq„ \ dOje* ‘ 7 *' 


Hence 


c 


r /a log A /aiog/\ _ r / r4 4 a log/a log A 

J-A 38, A.I 38* )J dX -J 3fo <><u ) fdX 




in virtue of (17.121) and (17.122), 
from (17.120). The theorem follows. 


=" Uik 


Example 17.18 

Let us estimate the live parameters of the bivariate normal form 


1 

1 I 

- a\ 

2na l a 2 (\ p 2 )* * 

. 2(1 -'/>*) l 

( ^ ) 


2 2p (x - a) (y - /») 




qo sx, y 


ao. 


It will be found that the partial differential coefficients of log L yield, on solution, the 
estimators 

£ ™ ft — y 


i 


- 2’ (,)• - x) ! 


p ff 2 -= - 27 (x - x)(y -■ y) 
n 

a* - ^ 2 (?/ - y) z 

so that for simultaneous estimation the sample means, variances and covariances are 
estimates of the corresponding parameters. 

To evaluate the sampling variances and covariances we have to evaluate integrals 
of the type 

J ,J-,V ", »*) 

These are easily obtainable, being merely functions of moments of different orders. 
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Taking the parameters a, /?, a a , p in that order, we find for the Hessian (17.116) 


1 

p 

0 

0 

o 

(1 — p 2 ) 

a x a 2 (1 — p 2 ) 


_ P _ 

1 

0 

0 

o 

<Ti<T a ’(l -p 2 ) 

a(( l"~p 2 ) 


0 

0 

2-p* 

p 2 

p 

<4(1 -p 2 ) 

Oi<r» (1 — p 2 ) 

<71 (1 -p 2 ) 

0 

0 

p 2 

2 -p 2 

p 

<r v (1 - p 2 ) 

«r| (1 - P 2 ) 

<7, (1 - p 2 j 

o 

0 

P 

P 

1 + p 2 


*iU~-p*j 

a, (1 - p 2 ) 

(1 _ p 2 ) 2 


This confirms, what we know already, that the distribution of means is independent of 
variances and covariances. We may consider the 2x2 block in the top left-hand comer 
and the 3x3 block in the bottom right-hand comer separately. If the determinants 
of these blocks are A x and A 2i we have 



A 2 =- 

The minors will be found to be given by 


4 

o*<jf(T-p*y 


4 P 


o\a\(\-p^ (l-p*Vo\o\ 
4p 4 

ofaf(l ~ p 2 ) 4 (T- P 2 Va\at 


\ 


0 

0 


Hence we find 


0 

0 

0 

A of 

var oc = —=, 
n 


0 

2 


2p 2 


0 

2p 


4 ot (1 - pY <4 <4 (1 - pY <4 <4 (1 - pY 


2p 2 


2p 


<4<4(i-pY <4<4(1 -PY 44(i -pY 


2 P 


2 p 


«?«$(!- pY <44 (1 - pY 4<4(i- pY 


& <4 

var p = 

n 


A 0*1 A (Jo A 

var (Ti = — 1 var u 2 = rA var p 


(1 -p 2 ) 2 


2n, 


2n 


n 


These results are already familiar. We have further- 


cov ( 6 X , 6 t ) = 


cov (A,ji) =e^± 
n 


Hence the correlation between and <7 a is p 2 , that between a and /? is p, and that between 
£ and #1 or c a is 

V* 
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Example 17.19 

Consider the TypeJtIFjiistribution 

" “ aTS C-^T - {- 

For the likelihood we have 

log L = - ralog<r - re log/’(p) ) (p - ljriog^—- - Z 

The three partial differential coefficients give 

-(p - i)r - = o 

(x — a) (t 


-n±;. lo g r( P ) 

dp 


a* 

! 2’ log 


. £ (x - a) = 0 


C-i-H 


For the Hessian, taking the parameters in the order a, a, p, we have 

> A . 1 i 

a 2 (p — 2) <r 3 o (p — 1) I 

i pi! 


a (p — 1) 


2 *H°8/’ ^_ - 


d 2 log/’(p) 

dp 2 

1 1 


(p — 2) o 4 ( dp 2 p — 1 

From this the sampling variances are found to be 

1 f d 2 log /' (p) . 

vnr ot -•/ o — i 


P - 1 (P - l) s 



1 

var a 

71A (7 “ 


1 

var a 

" n/Ta 2 

var p 

nA (p 


l<r 3 r dp 3 
i r l d 8 iog/’(p) 


(p - i) s 


Sufficient Estimators for Several Parameters 

17.47. As a natural generalisation from the case of one parameter we shall say that 
tx t p are jointly sufficient for 0 X ... 0 P if, and only if, the likelihood function can 

be expressed as 

L(x i . . . x n , 0i . . . 0 P ) = L x (t x . . . t p , 0 X . . . Op) L t {x x . . . x n ) (17.123) 
It evidently does not follow that if 0, . - . 0 P are known t x is sufficient for 0,. This will 
be true only if the function L x may itself be factorised, e.g.— 

L x (t x . . . t p , 0 X . . . 0 P ) = in (L, 0 X . . . Op) L xt (f, . . . t p , 0 2 . . . 0 P ). . (17.124) 

If a case occurred in which 

Li = in {t\i 0i) ii2 (^a> ^ 2 ) • • • il p (tp> @p) • 


. (17.125) 
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we might say that each t was sufficient for the corresponding 0 or that the set of t ’s was 
completely sufficient for the 0*s. Such cases, however, are vejry rare. 

Example 17.20 

From (17.113) it is evident that x and s are jointly sufficient for m and or. If a is known 
x is sufficient for m, but if m is known s is not sufficient for a. The two are not completely 
sufficient. 


17.48. The properties of sufficient estimators may be proved true, with certain 
modifications, for several parameters, but we shall not take the subject further except 
to quote one result. 

If f (x, 0 t . . . 0 P ) is continuous and not zero over some continuous range of the 0’s, 
and exists, then it is necessary and sufficient for the existence of a set of jointly sufficient 
estimators that 


/ = exp 


A k X k + B +Y 


1 


(17.126) 


where A k and B are arbitrary functions of the 0’s and X k and Y of x. (See Koopman, 1936.) 


Example 17.21 

* The Type III distribution of Example 17.19 gives us 

log/ = - P log a - log r (p) + (p - 1) log (x - «) - 

a 

If a is regarded as known, this may be put in the form 

- X — ~ + (p - 1) log (x - a) — p log a - log 1 1 (p), 
a 

which is of type (17.126) with 

A x X 1 —x— on 

a 

A t = p — 1, X 2 = log (x — a) 

B = - p log or - log r (p ). 

Thus if a is known, there are sufficient estimators for a and p jointly. It will be clear on 
inspection that if a is unknown there are no sufficient estimators, even if a and p are known. 


Parameters of Location and Beale 

17.49. Consider a frequency function expressed in the form 

dF = g d (-^--) .... (17-127) 

The parameter a may be regarded as locating the distribution and as determining its 
scale. In particular the normal distribution may be put in this form. We may write 

dF = exp £ (#) dS = exp 4> (£) ■ • • (17.128) 

P 

S = and $ (£) = log g ($). 


where 
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In samples of n we have 

log L — E <f> — n log ft, 

giving for the maximum likelihood estimators 


aiogL_ i 

W ■ ft ( ' 1 

whence we may solve for 4 and ft. 

For the variances and covariance we find 


(i?ff+») = 0, . 


(17.129) 

(17.130) 




Hi P-D 




-’ s \f.re 




■(T)‘ 


3Jpg/ 3log/' 
' 0« dft ~, 


and the Hessian of (17.116) becomes 


-(?) -*(W 


(17.131) 


-( 9 ) .,(€;-■) ; 

from which the variances and covariance of a and ft may be determined in the usual way. 

In (17.131) it would be a great convenience if the quantity — E ^ vanished, for 
then a and ft would be independent. By a suitable choice of origin we can, in fact, ensure 


that this is so. Put 


Then 


: = { — 


E W f) 
E (V) ‘ 


. (17.132) 




= E (ty + ff% 

so that 

e w a = o. 

With this origin we have for the variances of the (uncorrelated) variables & and ft. 


nE (</>") 


™*--ngW) .‘ 1M “> 

The point of location so defined, namely, as that for which & and ft are uncorrelated, has 
been called by Fisher the centre of location. 
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Example 17.22 

For the normal distribution 


dF — — exp 

PV(2*) 




we have <f> = — 

E (+') = - 1 and E ( <f>” |) = 0. 

Hence £ = £, and the origin chosen is itself the centre of location. From (17.133) and 
(17.134) we find the familiar results (for large samples) 

A - /S* 

var a = var x = — 


var /J = var s = 


with x and s uncorrelated. 


Example 17.23 

Consider again the Type III distribution 

a<a:<co - <’ >2 

where we assume p known. The condition p > 1 is required to ensure the vanishing of 
the frequency function at the extremity x = a, and p > 2 to ensure the convergence of 
some of the mean values. 

Here 

tf> = constant — | + (p — 1) log £. 

Hence 

e a V) — e ^ = — i 

*(«■**) -*(-p + l)- — (p — 1). 

Thus 

C = f - (P - 2). 

The centre of location is distant (p — 2) to the right of the start of the distribution. In 
terms of £ we have 

</> = constant — £ — (p — 2) + (p — 1) log (£ + /» — 2) 

A' = - 1 + A” = - ..b—. 1 ), 

9 £ +p - 2 9 (£+p- 2)* 

E(^) = -l/(p-2) 

E (+' £* - 1) = - 2. 


var*^>" 2 > 

n 


Hence 
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Efficiency of the Method of Moments 

17.50. In previous chapters we have fitted distributions of the Pearson type to 
other distributions by identifying lower moments. We were there mainly concerned with 
the properties of populations only and no question of the reliability of estimates arose. 
If, however, we regard the data as a sample from a population, the question arises whether 
fitting by moments provides the most efficient estimators of the unknown parameters. 
As we shall see presently, in general it does not. 

Consider a parent form dependent on four parameters. If the maximum likelihood 
estimators of these parameters are to be obtained in terms of linear functions of the moments 
(as in the fitting of Pearson curves), we must have 

= a 0 + a, Z (*) + a 2 E (*») + a, E (**) + a, £ (z 4 ) . (17.135) 

GO 

and consequently 

f{x, 0 U 0 2 , 0 3 , 0 4 ) = exp {6 0 + b x x + 0 2 x 2 + 0 3 x* + *4 z 4 }> • (17.136) 

where the 0’s depend on the 0’s. This is the most general form for which the method of 
moments gives maximum likelihood estimators. The 0’s are, of course, conditioned by 
the fact that the total frequency shall be unity and the distribution function converge. 

Without loss of generality we may take b x — 0. If, then, the other 0’s vanish except 
0 O and 0 a the distribution is normal and the method of moments is most-efficient. In 
other cases, (17.136) does not yield a Pearson distribution except as an approximation. 
For example, 

^2?/ — 20 2 x + 30 a x 2 f- 40 4 x 3 . 
ax 


If 0 3 and 0 4 are small this is approximately 


dlog/_ 2b 3 x 

dx . 30 s 204 

l — * x - r - x 2 


(17.137) 


which is one form of the equation defining Pearson distributions (cf. 6.2). Only when 
6 S and 0 4 are small compared with 0 2 can we expect the method of moments to give estimates 
of high efficiency. 


17.51. A detailed discussion of the efficiency of moments in determining the para¬ 
meters of a Pearson distribution has been given by Fisher (1921a). We will here quote 
only one of the results by way of illustration. 

We found in Example 17.19 that the variance for large samples of the maximum 
likelihood estimator p is given by 


var p 


or, if p = p — 1, by 


7 dMog/>)_ 2 _ 1_ \ 

r <*p 2 p — 1 (p — i) a j 


var p = - 


n | 2 


d 2 logr(l +p) 2 ^ 

dp 2 


-~ + - 4 } 

p p 2 j 


(17.138) 
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Now for large p* 

jLlog/Ml +i>) =^i||log2^ + (p + i)logp -p+ 

We then find 

2 — log r (i + p) — - + i- = A /.I —L 4 . JL 

dp* 8 ' P ’ p p* 3 \2)» 5p* T Ip* 

and hence, approximately, 

0 

var p = - (p* + ip). . 


380p* ^ 1260p* 


...} 


. (17.139) 


If we estimate the parameters by equating sample-moments to the appropriate moments 
in terms of parameters, we find 

a *f &p = wii 


so that, whatever a and a may be, 


a 2 p = m 2 
2 per 3 — m s 


h _ m i _ 4 

ml P 


(17.140) 


where is the sample value of (t lm Now for estimation by the method of moments (cf. 
9.22), 

.fix 


var b x = ^ (4/? 4 — 24/J a + 36 + 
n 


3 1 — 12/? s + 35/1 j), 

which for the present distribution is readily seen to reduce to 

var 


b «(p + l)(p+g) _ 

1 n * p 

Hence, from (17.140) we have for />, estimated by the method of moments, 


. (17.141) 


var p 


_ P 4 


16 


var 


->(P + 1)(P + 5). 
n 

For large p the efficiency of this estimator is then, from (17.139) with p = i +p, 


E = 


p 3 + Ip 


(p + i) (p + W(p + «V 

which is evidently short of unity in many cases. When p exceeds 38-1 (/?, = 0*102) the 
efficiency is over 80 per cent. For p = 19 (/? t = 0-20) it is 65 per cent. For p = 4 a more 

exact calculation based on the tables of the trigamma function — shows 

that the efficiency is only 22 per cent. 


* The series for the log r function is given in most books on advanced calculus, e.g. J. Edwards, 
Integral Calculus, vol. 2, article 942. 
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NOTES AND REFERENCES 

The greater part of this chapter is based on the researches of R. A. Fisher, the main 
papers being those of 1921a, 19256 and 1934a. The idea of maximising likelihood may 
be traced back to Gauss and was considered by Edgeworth, but may be regarded as begin¬ 
ning to exercise an influence on statistical theory only with the publication of Fisher’s 
first paper in 1912. 

The theorem giving the limiting variances and covariances of maximum likelihood 
estimates was proved (incorrectly) by Karl Pearson and Filon in 1898 before it was realised 
that it applied only to maximum likelihood. The necessary correction was given by Edge- 
worth (1908) and Fisher (1921a), but rigorous proofs were not available until the work of 
Hotelling (1930) and Doob (1934a and 6, 1935, 1936). In the text we have followed 
Hotelling’s treatment. 

The inefficiency of moments in fitting distributions, pointed out by Fisher (1921a), 
has led to some controversy, for which see Koshal (1933, 1935), Myers (1934), Elderton 
and Hansmann (1934), K. Pearson (1936), and Fisher (1937a). The reader who pursues 
this subject so far as to read any one of these papers should read them all. 

For work on sufficient estimators see Koopman (1936) and Pitman (1936, 19376), who 
independently obtained the general form of distribution admitting such estimators. The 
theorem that sufficient estimators have the property 17.17 is due to Fisher, rigorous proofs 
being provided by Neyman (1935a) and Dugue (1936a). Reference should also be made 
to papers by Bartlett (1936a, 6, 1937c, 19386, 1939a, 1940) on the problem of several para¬ 
meters and what he calls “ conditional ” statistics, i.e. those similar to s 2 when x or some 
other function of the sample values is regarded as known. See also Neyman and Pearson 
(1936a). 

Among recent papers, that by Pitman (1939a) on parameters of scale and location, 
and that by Welch (1939c) on the distribution of maximum likelihood estimates, are 
noteworthy. 

Geary (1942) has recently proved a remarkable generalisation of the theorem that 
in large samples maximum-likelihood estimators have minimum variance in the case of 
one parameter. In fact, for several parameters the maximum likelihood estimators 
minimise the “ generalised variance ” as defined in Chapter 28. 


EXERCISES 


SCn.i y If t is a most-efficient estimator and t' a less-efficient estimator with efficiency 
E, ahd'tl the correlation of t and t' is p, show by considering the estimator t" defined by 
(1 + E -2 p y/E) (1 - p y/E) t + (E - p y/E)t' 
that p = y/E (for in the contrary case var $"<§• var t). 

(Fisher, 19256.) 


17.2. If in n trials of an event with probability p there are x successes, show that 
a maximum likelihood estimator of p is x/n. Find its sampling variance and show that 
it is sufficient. 

17.3. Show that the distribution 

dF — i exp {— | x — 0 | } dx, 


— 00 < X < 00 
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has a likelihood function for a sample of n which is a maximum at the median if n is odd 
and between the (n/2) th and (n/2 + l)th members if n is even. 


17.4. For the distribution of the previous exercise show that for a sample of (2m + 1) 
members the median has an accuracy 

(m + 1) (2m + 1) f _ (2m)! 1 

(m — 1) \ 2 am “ 1 (m l) a j 

Hence, as m tends to infinity, the loss of information tends to 4 y/(m/n) ■— 4. Thus, 
although the median is most-efficient the loss of information in large samples does not 
tend to a constant. 

(Fisher, 19256.) 

17.5. Show that if a most-efficient estimator A and a less-efficient estimator B tend 
to j6int normality for large samples, B — A tends to zero correlation with A. 

Show that the error in B may be regarded as composed (for large samples) of two 
parts which are independent, the error in A and the error in B — A. (The first may be 
regarded as sampling error, necessarily inherent in the problem of estimation, the second 
as error due to the inefficiency of the estimator.) 

(Fisher, 19256.) 


17.6. Show that the distribution of the median in a sample of (2m + 1) observations 
from the population 


dF =- 


dx 


n 1 + (x - 0) 2 ’ 


— 00 <X < 00 


is given by 


,/F = ! (*1- rhA m - 

(m !) 2 n lm+v \ 4 /I 


dx 

+ lx - 0) 2 ’ 


where tan <f> = x — 0 and | <f> | < 

Show hence that the accuracy of the median is 

L I 2 ”* ^ 'o 8 ' * + ('? - *')**}’ (t - *')** 


= i , 3m (2m_+ l) (m +J)! ( 2 \ m+ * 
* 2 (m — 1) Jr* 2m — 1 


(i)" + * {s^r J ~-- <*> - a7 "+* <*>} 




2m + 3 


’m+i 


(2n) 


} 


where. J„ ( 2 ) is the Bessel function of order n and in particular (n) = (2n) = 0, 

J 3 ( 71 ) = 2 , J, (2n) = — and v 

71 1 <Tr 


71 


2 n 


'n + l 


Jn — J n -1 


(Fisher, 19256.) 
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17 . 7 . Show that the most general continuous distribution for which the maximum* 
likelihood estimator of a parameter 0 is the geometric mean of the sample is 

f(x, 0) = (^Jse exp {f (0) + C (x) }, 

where xp is an arbitrary function of 0, and f of x . Show further that the corresponding 
distribution giving the harmonic mean is 

(Keynes, J.R.8.8. (1911), 74, 323.) 

17 . 8 . Show that, if m is known, the estimator 

s — j ~ E (x — m) 2 j* 
is sufficient for a in samples of n from 

" ~ " XP { - 2? ( * - m) '} *'• 
and find its distribution by the method of 17.31. 

17.9. By considering the distribution 

dF — e~ ix ~ 0) dx , 0 < x < oo 

show that the three forms of (17.97) arc not necessarily equivalent when the range contains 
the parameter to be estimated. 

(Pitman, 1936.) 


17.10. Show that if the frequency function is continuous and is zero at an extreme 
which is a function of 0, there still exists a maximum to the intrinsic accuracy, defined 

(Pitman, 1936.) 


17.11. By considering the distribution 

O 

dF = 20TT o <x <0 + 1 


show tha t the intrinsic accuracy is 4ra 8 /(20 + l) 2 . Show further that the largest member 
of the sample is sufficient for 9 and that its distribution is 

2 nx (r 2 — 0 2 ) n_1 


dF — a (x) dx — 


(26 + l) n 


dx. 


Hence show that 

/ 3 log «\ 2 _ 4w 2 (0 + l) 2 4wO 2 _ 

V-IF ") (20 + l) 2 + (n - 2) (20 + l) 2 ’ 

so that the mean value in this case is greater than the intrinsic accuracy. 

(Pitman, .1936.) 
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f 1 /30V1 

17.12. If the frequency function of an estimator £ is 0 its accuracy is Iff < ^ f 1 >. 
If every possible sample with frequency <f> gave a different value of t the accuracy would 
be 'E } an d would be independent of t. Show that the difference in accuracy 

may be expressed as 




dd 


ld0\ 
0 dd) 


} 


and hence is not negative. 

Hence show that the efficiency as defined in 17.36 cannot exceed unity, at least if the 
range is independent of 0. 


(Fisher, 19256.) 


17.13. Show that 


dF = 


0 2 dx 


OO < X < 00 


not + ix-ej*' 

•does not admit of a sufficient estimator for either parameter if the other is known, or 
a pair of jointly sufficient estimators if both are unknown. 

(Koopman, 1936.) 

17.14. Show that if a distribution admits a sufficient estimator for either of two 
parameters when the other is known, it admits of a pair of jointly sufficient estimators 
when both parameters are unknown. 

(Koopman, 1936.) 

17.15. Show that the centre of location of the Type IV distribution 

p + 2 

oo <a? < oo 

to the left of the mode of the distribution. 


dF oc e~ v tan ~ l <■*-- «>/0 
where v and p are assumed known, is distant 


vfi 


■! 4 


(Fisher, 1921a.) 


17.16. For the distribution 


0i -\ <0i +| 2 


show that, in large samples, the mean tends to the form 

(-Tf) 48 ' 

Show further that the distribution of the centre of the sample, say c (the mean of the two 
extreme values), tends to 


Hence 


var c 6 


vara; n 

so that the centre is a far better estimator of location than the mean for this distribution. 

(Fisher, 1921a.) 
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EXERCISES 

17.17. Show that for the Type I distribution 

dF = - 5 -^—r a ?*- 1 (1 — x)« -1 dx, 0 < x < 1 
" (jPi 3) 

the geometrio mean of the sample values x and that of the values (1 — x) are jointly 
sufficient for the estimation of p and q. 

17.18. Show that all the Pearson distributions have sufficient estimators for some 
of the parameters if the others are assumed known, and ascertain which are the parameters 
concerned for each type. 

17.19. For the distribution of Exercise 17.15 show that the intrinsic accuracy for « is 

i ( P + 1) (p + 2) (p + 4) 

/?* (p + 4)® + v 2 

and that the efficiency of the method of moments in locating the curve is 

P 8 (P - l){(p + 4)«+»» } 

(p -f 1) (P + 2) (p + 4) (p* + v 2 )' 

(Fisher, 1921a.) 


A.9.—n 



CHAPTER 18 

ESTIMATION: MISCELLANEOUS METHODS 


Minimum Variance 

18.1. We have seen in the previous chapter that under certain general conditions 
the maximum likelihood estimator is most-efficient for large samples, and that for finite 
samples it leads to sufficient estimators where such exist. Sufficient estimators themselves 
contain all the information in the sample about the parameter under estimate. What 
we have not shown, however, is that maximum likelihood estimators have minimum variance 
in finite samples. 

We now consider the subject from a slightly different standpoint. Instead of begin¬ 
ning with the criteria of efficiency and sufficiency and showing that they lead to certain 
minimal properties, we shall examine the class of estimators which (a) are unbiassed and 
(6) have minimum variance. The minimal property is here taken as the starting-point. 


18.2. Consider, then, a frequency function f (x, 0), and as usual let us write 
L = / (x t , 0) . . . f (x n , 0). Then, writing J dx for the n-fold integral over the range 
of the x’s, we have to find t = t (x l9 . . . x n ) such that 



t Ldx = 0 . 



0) 2 Ldx = minimum. 


(18.1) 

(18.2) 


The first equation may also be written 

f (t — 0) Ldx = 0 .(18.3) 

J -CO 

The problem of finding t is one of the familiar problems in the Calculus of Variations. The 
minimal value of (18.2) has to be found subject to the condition (18.1), which is equivalent to 


. (i8 - 4> 


provided that the range of / is independent of 0 or that / vanishes at any extreme which 
depends on 0. 

If 2X is an unspecified parameter (which may depend on 0 but not on the x’s) the 
problem is equivalent to finding an unconditioned minimum of 


The solution is* 




(18.5> 


♦ See, for example, J. Edwards, Integral Calculus , vol. 2, article 1504, or A. R.,Forsyth, Calculus 

dt 

of Variations, article 15. Since the expression to be minimised does not contain r-, the Euler equation 

ox 

r BV 

for a stationary value to the integral I V dx reduces to — = 0. The derivation of (18.7) is not,. 
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or 

We then have 


(t-0)L-X^ = 0. 


t = 6 +- 


dL 

do 




(18.6) 


(18.7) 


where J is a function of the x’s but not of 0. 
3 log L . 

if we can express —m the form 


d log L 

w~ 


Thus there exists a t satisfying our conditions 



(18.8) 


This is a necessary and sufficient condition, except that it gives only stationary values of 
(18.2) which might, for instance, be maxima instead of minima. This is not a point, 
however, which need detain us from the statistical viewpoint, troublesome as it is to the 
mathematician. 


Example 18.1 

To estimate 6 in the normal population 

dF = a j{2n) 6XP {“ 2<t 2 ( * " 0) *} d *' “ °° <X <0 ° 

where a is assumed known. 

We have 

80 a 2 v ' 

This can be put in the form (18.8) by taking 

. . - a 2 

x ~ t and / — —, 

n 

and hence x is the required estimator. We note that it has minimum variance for any 
n in the class of unbiassed estimators of 0. 


Example 18.2 

To estimate 0 in 


We have 


dF =- 


dx 


n 1 + (x — 0) 2 ’ 


—■ 00 < X < 00 . 


aiogi, = 2 y f x - o_ \ 

do \ 1 + (x — 0) a J * 


This cannot be put in the form (18.8) and the method fails. There is no estimator which 
is unbiassed and has minimum variance. 


however, without its difficulties, and I think some conditions have been accidentally suppressed in 
the Aitken-Silverstone method. I understand that Dr. Leon Solomon, working with Dr. Aitken, has 
obtained a proof which depends on the fact that L shall be the product of n independent frequency 
functions. But for the war the point would doubtless have been cleared up by now, but at present 
it remains open. 
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18.3. Integrating (18.8) with respect to 0 we have 

log L - «(0) (t — 6) + fi (0) + Zy (*#). 

1 

where a, p, y are arbitrary functions (apart from the fact that the two former depend on 
A). Hence ( 

log/ (*, 0) = A (0) (t - 0) + B (0) + C (x) 

= p(6) t (x) + q (0) + r (x), say. . . . (18.9) 

Comparing this with (17.83), we see that the method of minim um variance will give a 
solution only if there exists a sufficient estimator. This explains the success of the method 
in Example 18.1 (where x is sufficient) and its failure in Example 18.2 (where no sufficient 
estimator exists). 


18.4. In the method of maximum likelihood it makes no difference to the final 
result whether we estimate for a parameter 0 or for some other parameter % functionally 
related to 0. For 

3 log L _ d log L d% 

~d0 dy~ 30 

and the two sides of the equation vanish together. In the method of minimum variance, 
however, there is an interesting difference. 

Suppose we wish to estimate 0 in ’ 

"" v(ZS) exp (“ST)‘ fc ' -»<*<»■ 

We have 

3 log L _ n If (a; 2 ) 

~30 20 + 2~W' 

and this may be put in the form (18.8) with 

1 902 

* = 7 (Z 2 ) and . 

n n 

If, however, we consider the parallel problem of estimating a in 
1 /I x 2 \ 

dF — ———- exp ( — — — o ) dx, — oo < x < oo ** 
o\f{2n) 2<r 2 / 

we find 

d log L __ n E (x 2 ) 

~la " a + 

which cannot be put in the form (18.8). We thus reach the peculiar result that the method 
will provide an estimator for a 2 but not for a. It follows that in general we may have 
to estimate, not 0 itself, but some function of 0, say r (0). 


18.5. If a minimum-variance estimator exists for some t (0) we riii 

d log L _ t — V 

T(Tp 

which is equivalent to 

31og£ _ 30 ~ X) 

30 A (0) ' * 



( 18 . 10 ) 



MINIMUM VARIANCE 


53 


W© estimate t by putting it equal to r and thus we shall have, for the estimator, 



. (18.11) 


This is equivalent to the equation of maximum likelihood. The two are not, however, 
identical. Maximum likelihood is not concerned with the existence of the function A. 
Minimum variance takes the function as fundamental, and when it exists the solution 
(which is the same as the maximum likelihood solution) has minimum variance for all n 
in the class of unbiassed estimators, not merely for large n. 


18.6. Let us suppose that 0 is the parameter (transformed if necessary) for which 
the estimating function is 0 itself. Then we have for the minimum-variance estimator t 

var t = f ($ — 0) a L dxy 

J — 00 

which, on substitution from (18.8), yields 

var 




. (18.12) 
. (18.13) 


if the range is independent of 0 or / vanishes at any extreme dependent on 0. 
Now from (18.8) we find 




Too 

and hence, substituting in (18.13) and remembering that J (t — 0) L dx = 0, we find 

-X .( 181 *) 

The variance of the minimum-variance estimator is thus simply the parameter A. It also 
follows from (18.13) that 


~-(W • 


(18.15) 


so that the result we reached in Chapter 17, as a limiting form for large n, is now seen to 
be exact for finite n under present conditions. 

Example 18.3 

To estimate 6 in the Type III form 

1 


dF = 


where p is assumed known. 


r( P )0' 


af~ 1 e~ x/9 dx, 


0 <.x <oo, p > 1, 



ESTIMATION: MISCELLANEOUS METHODS 


We have 


d log L _ np , nx 

~dd ~T + T* 


T 


which is of the form (18.8) if 


X 

t = ~ 
p 


Thus t is the minimum-variance estimator and has variance — for finite n, even though 

up 

the distribution is not normal. (Compare Example 17.8.) 


18 . 7 . We may readily determine what function r (0) should be taken as the estimating 
function. Taking the general form from (18.9), 

log/ (x y 0) = p (0) t (x) + q (0) + r (,x ), 


we have 


Hence, if 


we have 


log L = p E t (x) + nq 4 Er (x) 

_ 3? = _ dq/dp 
dp dd de ‘ 


,. r -£(*)- r 
d log L _n 

dr , .dp 

l/ng 


which is of the required form provided that 


1 dp 

J~ n di' 


(18.16) 


(18.17) 


(18.18) 


(18.19) 


Example 18.4 

Consider again the estimation of a in 
1 /la 


dF = ———- exp 
V(2 no*) r 




00 < 00 . 


log/ = - i log (2ji) - log a - I —, 


whence 


P (<t) = - 2^2 > t (*) — 9 - — log 


Thus the appropriate value of r, from (18.17), is 


_ _ dq I dp 
da/ da 


'2 
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which is thus determined as our estimating function. For the variance of the estimator 
of r we have 


A = l/”8f“ 


the estimator itself being -£(x 2 ). 


2 < 7 4 
n 


Minimum % 2 


18.8. We now turn to consider another principle which has been suggested for pro¬ 
viding estimators. If the data are grouped into cells with expected frequency typified 
by Ay and observed frequency by l jt then the function 




V 2 



. (18.20) 


where n — £ (Ay) = £ (lj) .(18.21) 

can, as we saw in Chapter 12, be used as a measure of closeness of fit. The method of 
minimum x 2 adopts this standpoint (which is, of course, arbitrary in the logical sense) 
and attempts to determine the parameters A such that is a minimum. 

In practice the method is not very easy to apply because of the difficulty of expressing 
the A’s in terms of the parameter under estimate, 0. For some illustrations reference 
may be made to Kirstine Smith (1916). We shall not consider the method at length 
here for two reasons :— 


(а) it may be shown that for large samples the minimum-£ 2 estimator tends to 
the maximum-likelihood estimator ; 

(б) there is a modification of the method, considered below, which is much easier 
to apply. 


18.9. For samples of fixed size n the distribution of the quantities Jy is multinomial, 
and we have for the likelihood function 


(*,!) / W 

n (i ,!) W U/ 


77 

i 


Thus 


. (18.22) 


(18.23) 


log L = constant -\- El t log ' 

Now for large samples we may put 

Xj — + a } n*. 

where a t is finite and therefore small compared with l } ; | a t »* | < l t ; and E (a,) = 0. 
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kence, from (18.23), 


log L = k + Zl 


.Og( 1+ ^) 


= k -\Z--%- + 0(n-*) 

h 


(18.24) 


Now write 


%'z — _ M* 

h 


= r *?_ 




». 


(18.25) 


Then we see that, to order rr*, L is maximised by minimising %'*. This latter quantity 
is not the same as % % because the denominator terms are Va instead of A’s. However, for 
large n the difference is of order n~*, for 

= O (»“*). 

Hence, to order w * the estimates obtained by minimising either % 2 or %' 2 will be equivalent 
to maximising L. 


18 . 10 . The advantage of using instead of % 2 in practice resides in the fact that 
the denominators in the former are integral. However, if there are any empty cells (i.e. 
those for which l } — 0) the formula (18.25) requires some modification. 

In the likelihood function, if l } — 0, — 1 f° r h- The substitution 

Xj — lj -f- dj 

wiD give us, for the empty cells, a term in (18.24) equal to — Zajn* = — Z k } = M, 
say. Hence we have 

x ' 2 = Z ^~ + 2 M, .... (18.26) 

h 

where the summation takes place over occupied cells and M is the sum of the theoretical 
frequencies A in the empty cells. 


Example 18.5 

As an example (Jeffreys, 1941) we consider a case where the maximum likelihood 
estimator is known, so that a comparison may be made with the result given by 
minimum %'*. 

Col. (2) of the following table shows the frequency of women in the first olass of Part II 
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of the Mathematical Tripos from 1910 to 1938 inclusive. 


follows the Poisson distribution 


«-•«/ 


to estimate 0. 


Assuming that this distribution 1 


(1) 

(2) 


(3) 



(4) 


Number of 
firsts, j 

Frequency 

h 

0 = 1 

h 

e = i-5 

0 - 2 

i 

0=1 

X ' 9 

0 = 1-5 

0 = 2 

0 

6 

. . 

10-7 

6-5 

3-9 

37 

0*0 

0-7 

1 

8 

10-7 

9-7 

7-9 

0-9 

0-4 

00 

2 

11 

5-3 

| 7-3 

7-9 

30 

1-2 

0-9 

3 

3 

1-8 

1 3-6 

5-2 

0-5 

01 

1-6 

4 

0 

; 0-5 

; 14 

2-6 

— 


_ 

5 

1 

01 

0-4 

10 

0-8 

0-4 

00 

over 5 

, 0 

i 

00 

01 

0-5 

2 M = 10 

! 2 M = 30 

2 M = 6*2 

Totals 

29 



i 

9-9 

51 

.. i 

9-4 


The sample mean (a sufficient estimator of 0) is in this case 44/29 = 1*52 with a standard 

error /- = 0*23. 

\ n 

To apply minimum x 2 we have to express tlxe theoretical frequencies in terms of 0. 
This results in an unmanageable equation if we then substitute in % a . Instead we cal¬ 
culate the minimum by finding x 2 f° r some trial values of 0 (in this case 1, 1*5 and 2) and 
then interpolating. 

The expectations A for the three selected values of 0 are shown in column (3) of the 
table and the corresponding x 2 hi column (4). It is found that, writing 0 = 1*5 + <f>, 
the values of x! 2 may he represented by the quadratic 

x ' 2 = 5-1 - 0 - 5 <f> + 18 * 2 <f> 2 . 

The minimum of this is given by </> = 0 01, and hence our estimate of 0 is 1*51, very close 
to the value of 1-52 given by the maximum likelihood estimator. 

18.11. On theoretical grounds there seems no reason to use minimum x 2 instead of 
mftTimmw likelihood. The method has some practical value, however, where the maxi¬ 
mum like lihood equations are difficult to solve. We can usually follow the device of the 
example just given, find x 2 or x! 2 for some trial values of the parameter, and approximate 
to the value which minimises x 2 or X 2 - Whether this is easier than finding the maximum 
likelihood estimate in the same sort of way depends on the circumstances of the case, but 
it may well be so when the frequency function is a tabulated integral, so that expected 
frequencies for specified parameter-values can be readily obtained. 

18.12. In the manner of 17.39 we can estimate the loss of information occasioned 
by the use of minimum /*. We have, for the minimum of x 2 > 

d T (l-W 
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which reduces to 


jr P-V to Q 

£ A* dO 


. (18.27) 


Since 


l + X 


tends to the constant value 2 for large samples, this is equivalent to the 


maximum likelihood equation 


rL=i»_ 0 , 

a do 


. (18.28) 


confirming that maximum likelihood and minimum % 2 give the same results in the limit. 
Since 

Z 2 - A 2 = 2A (Z — A) + (Z — A) 2 

the deviation of — from its 
30 




mean is 

Z 2 - A 2 3A 
A 2 ' 30 




(Z - A) 2 3A 
A 2 30* 


. (18.29) 


the first term vanishing on summation. As in 17.39 we find the variance of this quantity 

3 log L 

within samples for which —~ 

30 


is constant. We have 


var Z k (Z - A) 2 = 2 Z (k 2 A 2 ) - - £* (fcA 2 ) 

n 


Z 2 (kX' 2 ) 

* > V2\ ’ 


27 s 


A' 


(?) 


and on substituting k = we find 


27 2 


r* 


(S). 

(?) 


(18.30) 


giving the loss of information. 

As the sample size increases, this quantity remains finite. It is interesting to observe, 
however, that as the number of classes increases it also increases without limit, indicating 
that minimum % 2 breaks down for fine grouping. 


“ Inverse ” Probability 

18.13. According to Bayes’ theorem (7.24), if h (0) dO is the prior probability of 0, 
the posterior probability is given by 

P (0 | x l9 . . . x n ) = L (x l9 . . . * n , 0) h (0) dO . . . (18.31) 

It is then easy to determine the “ most probable ” value of 0 by maximising L h (0) if we 
know h (0). The principles of inference with which we have been concerned up to the 
present do not require the notion of the probability of 0 and, even if they did, would not 
give any guide to the nature of the function h (0). In fact, to an adherent of the frequency 
theory of probability, the prior probability of 0 requires the distribution of 0 in some form, 
and if 0 is merely an unknown constant it has no distribution (except the trivial one that 
/ = 1 when 0 takes its true value and / = 0 elsewhere). The alternative school of thought 
assumes the existence of h (0) as denoting a prior measure of belief, but, in order to find 
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the most probable value of 0, has to make some further assumption as to its values com-* 
parable to Bayes* postulate that for a finite range h is a constant. 

We have already noted that on this assumption the maximisation of L is equivalent 
to finding the value of 0 with the greatest posterior probability. It is also interesting to 
note that, whatever the form of h (0), maximum likelihood tends to give the same estimator 
as the method of maximising posterior probability for large n . In fact, for the maximisation 
of P in (18.31) we have 


d log P _ 3 log L d log h 

__ a 0- \ .30— 


(18.32) 


d lo 

In ordinary cases the variance of - is of order n, whereas the second term is inde¬ 
ed 

pendent of n. In the limit, therefore, the second term is negligible and we are reduced to 
the likelihood equation 

L 
dO ' 


Least Squares 

18.14. The method of least squares bears an analogy to minimum ^ 2 . Suppose 
we have an expression depending on a number of unknown parameters 0 X . . . 0 P and 
certain observed values x. This can be thrown into a form such as 

k (x, 0 X . . . 0 P ) =0, .... (18.33) 

where k is a given function (not a frequency function). Tf we have n values of x and n > p 
it is not possible to solve the n resulting equations of type (18.33) for the 0*s. We then 
consider the “ residuals ** k (x jt 0 t . . . 0 ;> ), and the principle of least squares states that 
the values of 0 t ... 0 p arc to be chosen so that 

Z {k (xj, 0! . . . 0„) } 2 = minimum, . . . (18.34) 

or, in other words, so as to satisfy the p equations 

(Xf, 0, . . . 0,,)} =0, i - 1 ... p. ■ . (18.35) 

/ ° { h 

18.15. Consider the case when the residuals are all distributed normally with variance 
a 2 . The logarithm of the likelihood is then (except for constants)— 

log Ij ~ — Vj Jog o — Zk*(x jy 0, ... 0„) . . . (18.30) 

and this is clearly maximised by minimising the sum (18.34). In this case, then, the method 
of least squares is equivalent to the method of maximum likelihood. In other cases it 
may give different results, and the justification for using it then becomes more or less 
empirical. 

18.16. The most important case occurring in statistical theory of the use of the 
method of least squares concerns regression equations. We have already seen that the 
coefficients of regression are, in effect, determined so as to minimise the sum of squares of 
residuals (cf. 15.2). We also know that, for the multiple normal distribution, residuals 
from the population regression lines are, in fact, normally distributed (15.13). For formal 
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Variation, therefore, the method of least squares is equivalent to maximum likelihood so 
far as concerns the simultaneous estimation of regression coefficients. 

18.17. This is a convenient point to prove a theorem (due to Gauss) which in one 
form or another is constantly occurring in statistical theory, particularly in connection 
with the normal distribution. Suppose we have a population (not necessarily normal) 
in which the regression of one variate y on the others x 0 (=1), x x . . . , x p is given by 

y — Po + Pi + • • • + Pp • • • (18.37) 

The Vs may be correlated among themselves and, in the extreme case, functionally related, 
so that this case includes that of curvilinear regression for our present purposes. Suppose 
that we have a sample of n values, where n> p. Denoting by Z summation over these 
n values, we determine the estimates of the /?’s by minimising the sum of squares, e.g. 

£(y - po - Pi *i- • • • - P P » p )*. 

Suppose that b t ... b p are the solutions of this process. Then our regression formula is 

y — b 0 — 6, x t — . . . — b p x p = 0. . . . (18.38) 

The observed residuals, obtained by substituting the observed values in this equation, 
are typified by 

e = y - b 0 — 6 X x x . . . — b p x p , . . . (18.39) 

whereas the “ real ” residuals are typified by 

e = y - Po — Pi Xx . . . — p p x p . . . . (18.40) 

We proceed to compare the sampling variances of e and e and to show that 

var e = --var e, .... (18.41) 

n — p — 1 

provided that the residuals are uncorrelated. 

Let us transform the observed values of the x’s to new values fi • • • £* (** for 

each) such that 

Z (*ih) = 1‘ j = 

= 0 .(18.42) 

Z(hy) = b k J 

This involves, for each £,p + 1 equations in n unknowns and is therefore possible in general. 
We then have 

E ffc (® — ®) = 2 ffc { (po — ^o) + (Pi b%) Xi + . . . (Pp bp) Xp } 

— Pk — b k - 

But Z e = T ({* y) - I { 6« + b x x t + . . . b p x p } 

— b/c — b k = 0. 

Hence p k — b k = — Z( k e. . . . (18.43) 

Now ~ Z e (e - e) = 2 {y - b 0 — . . . - bpXp} {(P, - b 0 ) + . . . (p p — b p ) x p } 

= 0 , 

since the summations give terms the vanishing of which determines the b’s. Hence 

Z e* — 2e* = Z (e — e) e 

~ S (b f — pj) Z Xf e, 

1 
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where S denotes summation over the (p + 1) values of j, 

= S E Sj e E Xj e 

— S {E Xj £ a } + cross-product terms in e, 

= S e 2 + cross-product terms. 

When we take expectations the cross-product terms vanish since the residuals are uncorre¬ 
lated. Hence 

E (E e 2 ) — E (S e 2 ) = EE e 2 , 

or (n — p — 1) var e = n var e, . . . (18.44) 

from which (18.41) follows at once. 

For normal variation we shall consider this result from a slightly different viewpoint 
in Chapter 22. 


NOTES AND REFERENCES 

The approach to minimum-variance estimators through the Calculus of Variations is 
due to Aitken and Silverstone (1942). For minimum % 2 see K. Smith (1916) and R. A. 
Fisher (1922a, 19256). For the modification x' 2 see Jeffreys (19386, 19396, 1941). 

A method of estimation essentially depending on the median has been proposed for 
use in quality control, but its value is as yet problematical. For an account of the technique 
see Simon (1941). 


EXERCISES 

18.1. From the property that the variance of a minimum-variance estimator is 
equal to A show that the most general distribution for which the sample mean is a sufficient 
estimator is 

/(*, 0) = c (ar, a) exp j - (x - 0) 2 j, 

where c is an arbitrary function and cr 2 is the variance of /. 

Hence show that no Pearson curve other than the normal admits the sample-mean 
as a sufficient estimator, but that a Gram-Charlier series may do so. 

(Aitken and Silverstone, 1942.) 

18.2. If the function A exists and 



show that the variance of the estimator t is 

_ 1 d*q 
n day 

where q is the function of 18.7. (Aitken and Silverstone, 1942.) 

V 

18.3. If a population (p + ?) 4 is regarded,as distributed in 5 classes, show that the 
4t& 

intrinsic accuracy is —. Show further that the loss of information through estimating 

pq 

p from minimum jj* is 

JL (3p* -2 pq + ‘iq 2 ) - ( (P * “ 2p * q + 18 2 , *9* “ 2 M 3 + g 4 ) 2 -. 

T hin is least when p — q and is then equivalent to the loss of 5 observations. 

(Fisher, 19256.) 



CHAPTER 19 


CONFIDENCE INTERVALS 

19.1. In the previous two chapters we have been concerned with methods which 
will provide an estimate of the value of one or more unknown parameters ; and the methods 
gave functions of the sample values—the estimators—which, for any given sample, pro¬ 
vided a unique estimate. It was of course fully recognised that the estimate might differ 
from the parameter in any particular case, and hence that there was a margin of uncer¬ 
tainty. The extent of this uncertainty was expressed in terms of the sampling variance 
of the estimator. With the somewhat intuitional approach which has served our purpose 
up to this point, we say that it is probable that 0 lies in the range t ± y/ var t, very probable 
that it lies in the range t ± 2y/ var t, and so on. In short, what we have done is in effect 
to locate 0 in a range and not at a particular point, although we have regarded one point 
in the range, viz. t itself, as having a claim to be considered as the “ best ” estimate of 0. 

19.2. In the present chapter we shall examine the logic of this procedure more 
closely and look at the problem of estimation from a different point of view. We now 
abandon attempts to estimate 0 by a function which, for a specified sample, gives a unique 
number. Instead we shall consider merely the specification of a range in which 0 lies. 
We shall not attempt to specify whereabouts in the interval the value of 0 really is ; all 
values in the range have an equal claim to be taken as the “ true ” value. Nor shall we 
assess the probability that 0 lies in the interval in the sense that 0 is regarded as a random 
variable. In fact, in the frequency theory of probability 0 is not a random variable (except 
trivially in that the frequency of 0 is unity when it takes the true value and is zero else¬ 
where). Nevertheless, probability plays an essential part in the determination of the 
interval and in the degree of confidence we have that it “ covers ” 0. 


Case of one Unknown Parameter 

19.3. Consider in the first place a population dependent on a single unknown para¬ 
meter 0 and suppose that we are given a random sample of n values x x ... x n from the 
population. Let z be a statistic dependent on the x’& and on 0, whose sampling distribution 
is independent of 0. (The examples given below will show that in some cases at least such 
a statistic may be found.) Then, given any probability a, we can find a value z l such that 

f dF (z) = a, # 

J — 00 

and this is true whatever the value of 0. In the notation of the theory of probability we 
shall then have 

P ( Z < | 6) = a. . . . . ' . (19.1) 

Now it may happen that the inequality * < *, can be transformed to the form 6 < t 1 or 
0 > tu where t x is some function depending on the value z 1 and the x’b but not on 0. For 
instance, if z — x — 6 we shall have 

x — 6 <z t 

and hence 6 > x — g x . 

If‘this transformation can be made we then have, from (19.1), 

P (0 < ti | 6) = «. 
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(19.2) 
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More generally, suppose that we can find a function t X9 depending on a and the x’a 
but not on 0, such that (19.2) is true for all 0. Then we may use this equation in probability 
to make certain statements about 0. 

19.4. Note, in the first place, that we cannot assert that the probability is a that 
0 does not exceed a constant t x . This statement (in the frequency theory of probability) 
can only relate to the variation of 0 in a population of 0’s, and in general we do not know 
that 0 varies at all. If it is merely an unknown constant then the probability that 0 < t t 
is either unity or zero. We do not know which of these values is correct, but we do know 
that one of them is correct. 

We therefore look at the matter in another way. Although 0 is not a random variable, 
t x is and will vary from sample to sample. Consequently, if we assert that 0 < t x in each 
case presented for decision, we shall be right in a proportion a of the cases in the long run. 
The statement that the probability of 0 is less than or equal to some assigned value 
has no meaning except in the trivial sense already mentioned; but the statement that 
a statistic t x is greater than or equal to 0 (whatever 0 happens to be) has a definite proba¬ 
bility a of being correct. If therefore we make it a rule to assert the inequality 6 <t x 
for any sample values which arise, we have the assurance of being right in a proportion 
a of the cases “ on the average ” or “ in the long run/' 

This idea is basic to the theory of confidence intervals which we proceed to develop, 
and the reader should satisfy himself that he has grasped it. 

19.5. To simplify the exposition we have considered only a single quantity t x and 
the statement that 0 < t x . In practice, however, we usually seek for two quantities t 0 
and t l9 such that 

P{«o<K<i|8} = a,. (19.3) 

and make the assertion that 0 lies in the range t 0 to t x . These quantities are known as the 
Lower and Upper Confidence Limits respectively. They depend only on a and the sample 
values. For any fixed a the totality of values of t 0 and t x for different samples determine 
a field within which 0 is asserted to lie. This field is called the Confidence Belt or Region 
of Acceptance. We shall give a graphical representation of the idea below. The number 
a is called the Confidence Coefficient. 

Example 19.1 

Suppose we have a sample of n from the normal population with unit variance 

dF = —4-v exp {- \ (x — p) 2 } dx , — oo < x < oo. 

v( 2 n ) 

The distribution of means x will be 

dF — jjr ex P | (£ — /*) 2 | dx, — co <x < oo. 

From the tables of the normal integral we know that the probability of a positive deviation 
from the mean not greater than twice the standard deviation is 0-97725. We have 
then— 
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which is equivalent to 



2 

y/n 


<(l 



0-97726. 


Thus, if we assert that fi is greater than or equal to * — 2/y/n we shall be right in about 
97-725 per cent, of the cases. 

Similarly we have 


P 




<x + 


y/n 



0-97726. 


Hence, combining the two results, 

P^x - ^- n <n < * + ^ I A*} = 2 (0-97726) - 1 = 0-9546. 

Hence, if we assert that lies in the range x ± 2 /y/n we shall be right in about 95*45 per 

cent, of the cases in the long run. -- 

Conversely, given the confidence coefficient we can easily find from the tables of the 

normal integral the deviation d such that P | x — — < ^ < x + = a. For instance, 

if a = 0*8, d = 1*28, so that if we assert that p lies in the range x ± 1*28/V?i the odds 
are 4 to 1 that we shall be right. 

The reader to whom this approach is new will probably ask : but is this not a round¬ 
about way of using the standard error to set limits to an estimate of the mean ? In a 
way, it is. In effect, what we have done in this example is to show how the use of the 
standard error of the mean in normal samples may be justified on logical grounds without 
appeal to new principles of inference other than those incorporated in the theory of proba¬ 
bility itself. In particular we make no use of Bayes’ postulate.* 

Another point of interest in this example is that the upper and lower confidence limits 
derived above are equidistant from the mean x. y This is not by any means necessary, 
and it is easy to see that we can derive any number of alternative limits for the same con¬ 
fidence coefficient a. Suppose, for instance, we take a = 0*9545, and select two numbers 
ot 0 and a 2 , which obey the condition 

(a 0 + oti — 1) = 0*9545, 

say a 0 = 0*9645 and oc t = 0*99. From the tables of the normal integral we have 


< v?i'*} =s °-" 

P j* -n> - |yKj= 0-9646, 

and hence 

_ f _ 2-326 1-806, ) 

P { X ~W^ <tl <* + ^l^}=°- 95 *6. 

Thus, with the same confidence coefficient we can assert that p lies in the range x — 2/y/n 
to 5 + 2/ Vn 9 or in the range x — 2*326/ y/n to x + 1*80 to/Vn. In either case we be 
right in about 95*45 per cent, of the cases. 

We note that in the first case the range is 4 /'s/n units and in the second case it is 
4-132/Vw units. Other things being equal, we should choose the first set of limits since 
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they locate the parameter in. a narrower range. We shall consider this point in more 
detail below. It does not always happen that there is an infinity of possible confidence 
limits or, if there is, that any simple rule of choice between them can be formulated. 

Graphical Representation 

19.6. In a number of simple cases, including that of the previous example, the con¬ 
fidence limits can be represented in a useful graphical form. We take two orthogonal 
axes, OX relating to the observed x and OY to p (see Fig. 19.1). 



Km. 19.1. 

The two straight lines shown have as their equations 

/< = x + 2, p = x — 2. 

Consequently, for any point between the lines, 

x — 2 < ft < x + 2. 

Hence, if for any observed x we read off the two ordinates on the lines corresponding to 
that value we obtain the two confidence limits. The vertical interval between the limits 
is the confidence range (shown in the diagram for £ — 1), and the total zone between the 
lines is the confidence belt. We may refer to the two lines as the Upper and Lower 
Confidence lines respectively. 

This ftyample relates to the somewhat trivial case n — 1. For different values of n 
there will be different confidence lines, all parallel to p — x. They may be shown on a 
single Hin gr am for selected values of », and a figure so constructed provides a useful method 
of reading off confidence limits in practical work. 

A.S.—VOL. II. F 
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Central and Non-central Intervals 

19 . 7 . In Example 19.1 the sampling distribution on which the confidence intervals 
were based was symmetrical, and hence, by taking equal deviations from the mean, we 
reached equal areas of the frequency function as a 0 and In general we cannot achieve 
this result with equal deviations, and subject always to the condition a 0 + *i — 1 = a 
the two quantities may be chosen arbitrarily. 

If a« and a t are taken to be equal, we shall say that the intervals are central. In such 
a case we have 

P (U < 0) = P (6 < h) = LtS.(19.4) 

In the contrary case the intervals will be called non-central. 

19 . 8 . In the absence of other considerations it is usually convenient to employ 
central intervals, but circumstances sometimes arise in which non-central intervals are 
more serviceable. Suppose, for instance, we are estimating the proportion of some drug 
in a medicinal preparation and the drug is toxic in large doses. We must then clearly 
err on the safe side, an excess of the true value over our estimate being more serious than 
a deficiency. In such a case we might prefer to take <xi very near to unity or even equal 
to unity, so that 

P (0 < *0 = 1 
P (to < 0) = «, 

and we are certain that 0 is not greater than ti. 

Again, if we are estimating the proportion of viable seed in a sample of material that 
is to be placed on the market, we are more concerned with the accuracy of the lower limit 
than that of the upper limit, for a deficiency of germination is more serious than an excess 
from the grower’s point of view. In such circumstances we should probably take a 0 as 
large as conveniehtly possible so as to be nearer to certainty about the minimum value 
of viability. This kind of situation often arises in the specification of the quality of a 
manufactured product, the seller wishing to guarantee a minimum standard but being 
much less concerned with whether his product exceeds expectation. 

19 . 9 . On a somewhat similar point, it may be remarked that in certain circum¬ 
stances it is enough to know that P { t 0 < 0 <$i|0} exceeds some quantity a. We then 
know that in asserting 0 to lie in the range t 0 to we shall be right in at least a proportion 
a of the cases. Mathematical difficulties in ascertaining confidence limits exactly for 
given a, or theoretical difficulties when the distribution is discontinuous may, for example, 
lead us to be content with the inequality rather than the equality of (19.3). 

Example 19.2 

To fi nd confidence intervals for the parent proportion to of successes in sampling for 
attributes. 

In samples of n the distribution of successes is given by the binomial (x + to) n . We 
will determine the limits for the case n — 20 and confidence coefficient 0’95. 

We require in the first instance the distribution function of the binomial, which is 
obtainable from Table 5.2 (vol. I, p. 119). Summing the number of successes and dividing 
by 10,000, we find from that table the following;— 
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i Proportion of 


... 

Successes 

m = o-l 

<N 

6 

II 

B 

V 



0-00 

0-1216 

0-0115 

0-05 

0-3918 

00691 

0-10 

0-6770' 

0-2060 

0-15 

0-8671 • 

0-4114 

0-20 

0-9569 

i 00296 

0-25 

0-9888 

j 0-8042 

030 

0-9977 

0-9133 

0-35 

0-9997 

! 0-9678 

0-40 

1-0001 

i 09900 

0-45 

1-0002 

0-9974 

0-50 

— 

! 0-9994 

0-55 


0-9999 

0-60 

! — 

1-0000 

0-65 

— 

_ 

; 0-70 

— 

_ 

! 0-75 


... 

0-80 

— 


0-85 

— 


0-90 

— 

. _ 

! 0-95 i 

—. 



TD = 0-3 

! TD « 0-4 

TD = 0-5 

0-0008 

i 


0-0076 

0-0005 

! _ 

0-0354 

0-0036 

: 00002 

0-1070 

0-0159 

! 0 0013 

0-2374 

0-0509 

i 0-0059 

0-4163 

; 0-1255 

! 0-0207 

0-6079 

! t 0-2499 

| 0-0577 

0-7722 

0-4158 

| 0-1316 

0-8866 

! 0-5955 

0-2517 

0-9520 . 

0-7552 

0-4119 

0-9828 

0-8723 

0-5881 

0-9948 

i 0-9433 

0-7483 

0-9987 

i 0-9788 i 

i 0-8684 

0-9997 

j 0-9934 

0-9423 

0-9999 

0-9983 

0-9793 

—. 

i 0-9996 

0-9941 


0-9999 

0-9987 


1 

| 0-9998 

— 


1-0000 


The final figures may be a unit or two in error owing to rounding up, but that need 
not bother us to the degree of approximation here considered. Values for m — 0*6 to 0*9 
may be obtained by symmetry. 

We note in the first place that the variate p is discontinuous. On the other hand 
we are prepared to consider any value of w in the range 0 to 1. For given w we cannot 
in general find limits to p for which a is exactly 0-95 ; but we will take p to be the nearest 
multiple of 0*05 which gives' confidence coefficients at least equal to 0*95, so as to be on 
the safe side. We will consider only central intervals, so that for given td we have to find 
p 0 and pi such that 


P {td > p 0 } > 0-975 
P {td <pi) > 0-975, 


the inequalities for P being as near to equality as we can make them. 

Consider the diagrammatic representation of the type shown in Fig. 19.1 and given 
for our present case in Fig. 19.2. 

From the table we can find, for any assigned m, the values w 0 and td v such that 
P (p > w 0 ) > 0-975 and P (p < cj x ) > 0-975. Note that in determining m 1 the distribution 
function gives the probability of obtaining a proportion p or less successes, so that the 
complement of the function gives the probability of a proportion 1 — p — 0-05 or less 
(not 1 — p). Here, for example, on the horizontal through w ^=0-1 w% find td 0 =--- 0 and 
m t = 0-30 from our table ; and for w = 0*4 we have td 0 ^ 0-15 and = 0-65, The points 
so obtained lie on stepped curves which have been drawn in. The zone between them is 
the confidence belt. For any p the probability that we shall be wrong in locating m inside 
the belt is at the most 0-05. We determine p 0 and *p v by drawing a vertical at the given 
value of p on the abscissa and reading off the values where it intersects the curves. That 
these are, in fact, the required limits will be shown in a moment. 
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We could have found more precise confidence limits by interpolating in the table 
obtained above. For example, with p — 0*30 we see that 


for m = 01, P = 0-9977 
for m = 0-2, P = 0-9133. 


Hence, for P — 0-976 we have approximately 


m — 0-1 


9977 - 9760 
+ 9977 -9133 


(0-1) = 0-127, 


and closer approximations can be obtained if desired. The corresponding point on the 



Fio. 18 . 2 . 

lower confidence line to = 0-127 is p = 0-35. Calculations on these lines give us the 
values of m such that 

P{p« <ro < } = a exactly, 

whereas the former approach gave values such that 

P {Po <ro < Pi) — a approximately, 

> a in any case. 

Discontinuous variates usually give rise to this sort of arithmetical nuisance, but the 
approximation in practice is sufficiently good, except for very small samples. The broken 
curves in Fig. 19.2 give the more precise limits. They lie, of course, inside the more 
approximate step-curves. 

It is, perhaps, worth noticing that the points on the curves of Fig. 19.2 were constructed 
by selecting an ordinate m and then finding the corresponding abscissae w 0 and The 
diagram is, so to speak, constructed horizontally. In applying it, however, we read it 
vertically, that is to say, with observed abscissa p we read off two values p„ and p x and. 
assert that jp« < ts < p t . It is instructive to observe how this change of viewpoint oan 
be justified without reference to Bayes’ postulate. 
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Consider Fig. 19.3, which shows a pair of confidence lines for the binomial. Let to' 
be a given value of to and let the horizontal through to' meet the confidence lines in points 
with abscissae to, and w x . Then we know that in repeated samples from a population 
with parameter to' a proportion a will give observed values of p lying between to« and w 1 ; 
for the curves were constructed so that this should be so. 

Now since the horizontal at to' lies entirely within the confidence belt for to 0 <2> < Wi 
(and does so for any to'), it follows that the assertion that to' lies in the belt is correct if, 



n.riH only if, p lies between to„ and toj, that is in a proportion a of the cases. This, being 
true for any to', is true for all to', irrespective of the relative frequency of occurrence of the 
to’s under estimate. Consequently our assertion that to lies in the confidence belt is correct 
in a proportion a of the cases ; and, in particular, for any observed p we may assert that 
to lies within the ordinates determined on the two curves by the vertical through p. 


Confidence Intervals for Large Samples 

19 . 10 . In our usual notation, the logarithm of the likelihood function gives 

n 

]ogL = 2>g/(*„ 6), • 

1-i 


and 

, 9 log L 

We may regard — 


3 log L _ r 0 log / 

‘ '30 00 ' 

as a random variable, and in particular write— 




(19 .6 
(19.6) 


so that 


(19.7) 
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Write 


dlogL 

ae 

v VM)’ 


. (19.8) 


Then, for large samples, tp will be distributed normally in the limit with unit variance, in 
virtue of the Central Limit Theorem, under very general conditions. It will also have 
zero mean, since 




(19.9) 


Hence, from the distribution of y> we may easily determine confidence limits for 6 in large 
samples if y> is a monotonic function of 0 , so that inequalities in one may be transformed to 
inequalities in the other. 


3f 

It is sufficient (but not necessary) for the existence of the normal limit to tp that ~ 


exists for all x , except perhaps at isolated points, that the range is independent of 0 and 

that the Central Limit Theorem applies (e.g. if the third moment of - exists). We 

ou 


also assume, as usual, that differentiation under the integral sign, as in (19.9), is legitimate. 


Example 19.3 

Consider again the problem of Example 19.1. We have, with \i for 0, 


Hence 


f(*> P) = 


1 

V(2*) 


exp {-- £ (x - ft)*} 


3 log/ = 

d/i 




var 


= 1 . 

v ' = 2 ’(^r) ==(5 ~^ ) 


is normally distributed with unit variance for large n. (We know, of course, that this 
is true for small n as well in this particular case.) The confidence limits may then be set 
as in Example 19.1. 


Example 19.4 

Consider the Poisson distribution whose general term is 


f(x, X) = 


X*e~ x 
x ! ’ 
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We have 

3 log / = * , 

3A A 


Hence 




ii'w- 

Vln/A) 



A). 


For example, with a = 0-95, corresponding to a normal deviate ± 1-96, we have, for the 
central confidence limits, 

<* - »> ± l » 6 ' 

giving, on solution for A, 



A + x 2 = 0 


3-84f 3-69 

n n 2 



the ambiguity in the square root giving upper and lower limits respectively. 
To order n * this is equivalent to 


A 


= x + 1-96 



from which the upper and lower limits are seen to be equidistant from the mean x , as we 
should expect. 


Shortest Sets of Confidence Intervals 

19 . 11 . It has been seen in Example 19.1 that ip some circumstances at least there 
exist more than one set of confidence intervals, and it is now necessary to consider whether 
any particular set can be regarded as better than the others in any useful sense. The 
problem is analogous to that of estimators, where we found that in general there are many 
different estimators for a parameter, but that we could sometimes find one (such as that 
with minimum variance) which was superior to the rest. 

In Example 19.1 the problem presented itself in rather a specialised form. We found 
that for the intervals based on the mean x there were infinitely many sets of intervals 
according to the way in which we selected ot 0 and ol x (subject to the condition that 
a 0 -f ocx = 1 + a). Among these the central intervals, a re obviously the shortest, for a 
given range will include the greatest area*of the normal curve if it is centred at the mean 
of the curve. We might reasonably say that the central intervals are the best among 
those determined by x. 

But it does not follow that they are the shortest of all possible intervals, or even that 
such a shortest set exists. It might also happen that for two sets of intervals c x and c t 
those of c t are shorter than those of c% in part of the range of g’s and longer in other parts. 
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19.12. We will therefore consider sets of intervals which are shortest on the average. 
That is to say, if 

<5 = < x - 

we require that 


ddF = minimum, 


. (19.10) 


. (19.11) 


where the integral is taken over all x’a and is therefore equivalent to 

f" . . . f“ 6Ldx 1 . . . dx n . . . . . (19.11) 

J —00 J —QO 

We now prove a theorem which is very similar to the result that maximum-likelihood 
estimators in the limit have minimum variance, namely that in a certain class of intervals 
the method of 19.10 gives those which are shortest on the average. 

Let h ( x , 0) be a function which has a zero mean value and is such that the sum of 
a number of similar functions obeys the Central Limit Theorem. Then 


X h (ay, 6) 

j = l _ 

y/(n var h) 


. (19.12) 


is normally distributed in the limit with zero mean and unit variance, xp of equation 
(19.8) is a member of the class f. We prove that the average rate of change of xp with 
respect to 0, for each fixed 0, is greater than that of any £ except in the trivial case 


dO 


Hence 


9 j© / 

Writing g ( x , 0) = —, we have 
ou 

* 1 . = 1 f 

00 y/(n var g) \ 

ac = __ i_ f 

00 y/(n var h) \ 


-JL . 

2 vary v dO J 
1 „. 5varA) 


(19.13) 


(19.14) 


jg(?v)= . Iee(^)- 1 - ZE(g) 

\d0) V(»vargf)\ \d6) 2 var g vs " 


Now E (g) = 0 and 




' = -E (g*). 


( 8 )- 


_ _ n E (g 2 ) 

V(n var g) 

= — V(»varflr) = A u say. 


Similarly, 




(19.15) 


(19.16) 
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Hence 


= — cov (A, g). . 

A\ — A\ — n var g — — - cov 2 (A, g) 


var A 


n 


, {var A var g — cov a (A, < 7 ) }. 
var A * * } 

Thus, unless A is a multiple of r/, we have 

A '\ > A l 

which was to be proved. 

Now if y) a is a value such that 


(19.17) 


(19.18) 




(19.19) 


(19.20) 


the upper and lower confidence points for central intervals are -fc y> x and the values of 9 
are the solutions of 

Z 9J X ' = ±y , a> . . . . 

y/(n var g) 

say t 0 and t x . Similarly those for any function A are given by 

Sh A*>3 = ±Va) . . . . 

y/(n var A) 

say Uo and u x . The equations for confidence points are equivalent to 

V> (*) = ± V* 

: (u) = 

or, effectively, in large samples, by 

V’(0o ) + «-0.)(-^) # =±f 

t (0.) I- (w - 0.) ( = ± Vco 

where 0 O is a fixed value, of 0. When t = 0 o and u — 0 o we have y (0 O ) — C (0o)- Hence 

W (»X. (10 ' 2l) 

Now we have just shown that, on the average, ^ Hence, on the average, 

t 0 O < W 0o, 

and the confidence limits t are closer together than those of any member of the class u for 
any fixed value of 0. 


19.13. A comparison of the result we have just proved and the properties of maxi¬ 
mum likelihood estimators in the limit will show the close relation between confidence 
intervals and the theory of estimation developed in Chapter 17. In 17-27 we showed. 
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by considering the quantity u = 


d log L 
dd 9 


that any estimator t which is in the limit 


distributed normally about the true value 0 O cannot have a variance less than 



and that the latter quantity, in the limit, is the variance of the maximum likelihood esti¬ 
mator. It attains the minimal value when u is constant over samples for which t is constant. 

The theorem of 19.12 shows that on the average the intervals determined by the 
distribution of u are shorter than those based on any other function with a zero mean value 
(obeying the usual conditions as to continuity, etc.). Since the maximum likelihood 
estimator has minimum variance, we should expect that confidence intervals based on its 
distribution would be shorter than others ; and this we now see to be so. For if u is constant 
over samples of constant t, the distribution of u in all samples is equivalent to that of t. 


Confidence Intervals and Sufficient Estimators 

19.14. Pursuing this line of thought, we are led to inquire whether sufficient esti¬ 
mators provide confidence intervals for finite samples and whether they have any minimal 
properties of the kind we have just established for large samples. 

It is easy to see that sufficient estimators do in fact provide confidence intervals. 
If t is sufficient for 0 , the likelihood function may be put in the form 

L =fi(t, 0)/.(*i - - - x n ) . . . .(19.22) 

and the distribution of t and 0 is 

dF sas f x (t, 0) dt .(19.23) 

Given a we can then find t 0 and t x such that F (t 0 , 0 ) = 1 — a 0 and F (t l9 0) = ot x and solve 
for 0 in terms of t 0 and a 0 or t x and oci, as the case may be. This process will provide the 
inequalities of the type we require, a proposition which we shall prove formally below 
(19.25). 


Example 19.5 

In Example 17.8 we saw that 
is sufficient for 0 in the distribution 


0 


X 

V 


x p - 1 e~*i* , 
dF = - - - dx, 

r(p)0*> 


0 < x < oo, p > 1 , 
where p is regarded as known. The distribution of 6 is in fact 


dF 


= f HP 
V 0 




gnp-i 0X p ^ 


r(np) 


d§. 


nio 0 

The distribution function of m = is the incomplete 7 T -function 


0 

r m (np) 

r (np) 
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We then find the values of m corresponding to a t and ax from the tables, and have 

P (m < m„) — <x 0 
P (m> m,) = a„ 

whence 


^ <e 

b m 0 


nil 


}- 


a 0 + a x — 1 


= a. 


19.15. The position in regard to minimal properties of confidence intervals based 
on sufficient estimators remains somewhat obscure, but one would expect some such proper- 

d lo Ij 

ties to hold even for finite n. Since u = —— is constant for constant t when t is sufficient, 


the variance of u will be a function of the variance of t. This, however, is not necessarily 
enough to establish the fact that the corresponding confidence intervals are shortest on the 
average. It is imaginable that the confidence intervals derived from its distribution might 
be longer on the average than those of some other system. This seems rather unlikely, 
at least for the ordinary distributions of statistical theory, but apparently no proof has 
been given. 


19.1b. Neyman (19376) has proposed to apply the phrase “ shortest confidence 

intervals ” to sets of intervals defined in quite a different way. As it does not appear 

that such intervals are necessarily the shortest in the sense of possessing the least length, 
even on the average, we shall attempt to avoid confusion by calling them “ most selective.” 

Consider a set of intervals c 0 , typified by 6, obeying the condition that 

P {<5 o c0|0} =oc,.(19.24) 

where we write <5 0 c 0—that is, <5 0 “ contains ” 0—for the more usual t 0 < 0 < t x (t x — t 0 = <J 0 )- 
Let c x be some other set typified by <5 X such that 

P {6,0010} = a.(19.25) 

Either set is a permissible set of intervals, as the probability is a in both cases that the 
range <5 contains 0. 

If now for every c x we have, for any value 0' other than the true value, 

P {d 0 c 0' | 0> <P {0iC0'|0}, . . . .(19.26) 

c 0 is said to be most selective . 

19.17. The ideas underlying this definition will be clearer from a reading of Chapters 
26 and 27 dealing with the Neyman-Pearson theory of inference. We anticipate them here 
to the extent of remarking that the object of most selective intervals is to cover the true 
value with assigned probability a, but to cover other values as little as possible. We may 
say of both c 0 and c x that the assertion 6 c 0 is true in proportion a of the cases. What 
nnfijks out c® for choice as the most selective set is that it covers false values less frequently 
than the remaining sets. 

The difference between this approach and the one leading to shortest intervals is that 
the latter is concerned only with the narrowness of the confidence interval, whereas the 
former gives weight to the frequency with which alternative values of 0 are covered. One 
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concentrates on locating 0 with the smallest margin of error; the other takes into account 
the desirability of excluding so far as possible false values of 0 from the interval, so that 
mistakes of taking the wrong value are minimised. 

19 . 18 . Neyman himself has shown that most selective sets do not usually exist (for 
instance, if the distribution is continuous) and has proposed two alternative systems:— 

(а) most selective one-sided systems (Neyman’s “ shortest one-sided ” sets) whioh 

obey (19.26) only for values of O' — 0 which are always positive or always negative; 

(б) selective unbiassed systems (Neyman’s “ short unbiassed ” sets) which obey 

(19.26) but, in place of (19.26), the further relation 

P {<Jc0|0} = a>P (0c0|0 '} .(19.27) 

In essence these sets amount to a translation into terms of confidence intervals of 
certain ideas in the theory of tests of significance, and we may defer consideration of them 
until Chapters 26 and 27 are reached. 

Generalisation to the Case of Several Parameters 

19 . 19 . We now proceed to generalise the foregoing theory to the case of several 
parameters. Although, to simplify the exposition, we shall deal in detail only with a single 
variate, the theory is quite general. We begin by extending our notation and introducing 
a geometrical terminology which may be regarded as an elaboration of the diagrams of 
Figs. 19.1 and 19.2. 

Suppose we have a frequency function of known form depending on l unknown para¬ 
meters, 0 X . . . 0j, and denoted by f(x,O x . . . 0,). We may require to estimate either 
0j only or several of the 0’s simultaneously. In the first place we consider only the estima¬ 
tion of a single parameter. To determine confidence limits we require to find two functions 
u, and u x , dependent on the sample values but not on the 0’s, such that 

P {«„ < 0i < | 0i . • • 0/} = a.(19.28) 

where a is the confidence coefficient chosen in advance. 

With a sample of n values, x x . . . x n , we can associate a point in an n-dimensional 
Euclidean space, and the frequency-distribution will determine a density function for 
each such point. The quantities u 0 and u x , being functions of the r’s, are determined in 
this space, and for any given * will lie on two hypersurfaces (the natural extension of the 
confidence lines of Fig. 19.1). Between them will lie a Confidence Zone or Region of 
Acceptance. 

In general we also have to consider a range of values of 0 which are a priori possible. 
There will thus be an 1-dimensional space of 0’s subjoined to the n-space, the total region 
of variation having (1 + n) dimensions ; but if we are considering the estimation of 0 X , 
this reduces to an (n -f l)-space, the other (Z — 1) parameters not appearing as variables. 

We shall call the sample-space W and denote a point whose co-ordinates are x x ... x n 
by E. We may then write u t (E), u x (E) to show that the confidence functions depend 
on E. The interval u x (E) — u„ (E) we denote by d (E) or d, and as above we write c 0 X 
to denote < 0 X < u x . The region of acceptance or confidence zone we denote by A, 
and may write Eed or EeAto indicate that the sample-point lies in the interval d or 
the region A. 
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19 . 20 . In Fig. 19.4 we have shown two axes x x and x t and a third axis corresponding 
to the variation of 0 2 . The sample-space W is thus two-dimensional. For any given 
say 0 V the space W is a hyperplane (or part of it), one such being shown. 


Si 


*2 

Fig. 19.4. 

Take any given pair of values (x l9 x 2 ) and draw through the point so defined a line 
parallel to the 0 r axis, such as PQ in the figure, cutting the hyperplane at R. The two 
values of u 0 and u x will give two limits to 0 t corresponding to two points on this line, say 
U 9 V. Consider now the lines PQ as x l9 x 2 vary. In some cases U, V will lie on opposite 
sides of R y and 6 1 lies inside the interval UV. In other cases (as for instance in U'V' shown 
in the figure) the contrary is true. The totality of points in the former category deter¬ 
mines the region of acceptance A , shaded in the figure. If for any point in A we assert 
d c 0[, we shall be right; if we assert it for points outside A we shall be wrong. 

19.21. Evidently, if the sample-point E falls in the region A f the corresponding 
Oj lies in the confidence interval and conversely. It follows that the probability of any 
fixed 0[ lying in the confidence interval is the probability that E lies in A (0j); or in 
symbols— 

P{0 c 0i I 0 X . . . 0J = P {«„ <01 < Ml ! 0! • • • 0/} 

— P [E e A (0j) | 0 X . . . 0 ( }. . (19.29) 

From it follows that if the confidence functions are determined so that 

P{u t < 0j < «i | 0j . . . 0,} = a 

we shall have, for all 0 t , 

P{E e A (0j) | 0, ... 0,} = a.(19.30) 

It follows also that for no 0i can the region A be empty, for if it were the probability in 
<19.30) would be zero. 





78 


CONFIDENCE INTERVALS 


19.22. If the functions u 0 and u x are single-valued and determined for all E, then 
any sample-point will fall into at least one region of acceptance. For on the line PQ cor¬ 
responding to the given E we take an R between U and V, and this will define a value of 
0 U say 0' v such that E e A (0j). 

More importantly, if a sample-point falls in the regions A (0j) and A (0’i) correspond¬ 
ing to two values of 0 X , 6[ and d\, it will fall in the region A (0'i'), where 0 X is any value 
between 0[ and 0 X . For we have 

U„ <0i <U u u„ <0" < u lt 
and hence «« < 0j < 0'i < u x 

if 0',' is the greater, and hence 

Uq ^ 0j S 0i ^ 01 ^ 

or u„ < 0i < 

Further, if a sample-point falls in any of the regions A (0 X ) for the range of 0-values 
0[ < 0 1 < 0i, it must also fall within A (0\) and A (0^). 

19.23. The conditions referred to in the two previous sections are necessary. We 
now prove that they are sufficient, that is to say : if for each value of 0 X there is defined 
in the sample-space W a region A such that 

(1) P{E e A (0 X ) | 0 X } = a, whatever the value of the 0’s ; 

(2) For any E there is at least one 0 lt say 0j, such that E e A (0\) ; 

(3) If E e A (0i) and E e A (0','), then E e A (0'i") for any 0'i" between 0j and 0' x '; 

(4) If E e A (0 X ) for any 0 X satisfying 0j < 0i < 0’i, E e A (0j) and E e A (0'i); 
then « 0 and u „ viz. confidence limits for 0, are given by taking the lower and upper bounds 
of values of 0i for which a fixed sample-point falls within A (0 1 ). They are determinate 
and single-valued for all E, u 0 < u u and P {u 0 <0! < u x | 0 X } = a for all 0 X . 

The lower and upper bounds exist in virtue of condition (2), and the lower is not greater 
than the upper. We have then merely to show that P {u 0 < 0 X < u t | 0 X } = «, and for 
this it is sufficient, in virtue of condition (1), to show that 

P{u 0 <0 t <«i | 0 X } = P{E e A (0 X ) | 0x> • • • .(19.31) 

We already know that if E e A (0 X ) then «„ < 0 X < u x ; and our result will be established 
if we demonstrate the converse. 

Suppose it is not true that when u 0 < 0 X < u x , E e A (0 X ). Let E' be a point outside 
A (0 X ) for which u 9 < 0 X < w x . Then must either u 0 = 0 X or w x = 0 X or both ; for other¬ 
wise u 0 and « x being the bounds of the values of 0 X for which E lies in A (0 X ), there would 
exist values 0j and 0j, such that E e A (0j) and E e A (0 X ) and 

U 9 ^ 0 X <C. 0\ 0 X ^ tt x , 

so that, from condition (3), E e A (0 X ) which is contrary to assumption. 

Thus u 0 — 0 X or iii = 0i or both. If both, then E must fall in A (0j), for u 0 and u t 
are the bounds of 0-values for which this is so, and if they coincide their common value 
must be so. Finally, if u 9 = 0 X < « x (and similarly if u 0 < 0 X = m x ) we see that for 
«„ < 0 X < u x , E must fall in A (0 X ) from condition (3), and hence, from condition (4), E 
must fall in A (0[) and A (0'i) where 0 x = u 0 and 0',' = u v Hence it falls in A (0 X ). 

‘ 19.24. The foregoing theorem gives us a formal solution of the problem of finding 
confidence 'intervals in the general case, but it does not provide a method of finding the 
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intervals in particular instances. In practice we have three lines of approach : (1) to use 
sufficient estimators, (2) to adopt the process known as " studentisation,” and (3) to 
“ guess ” a set of intervals in the light of general knowledge and experience and to verify 
that they do or do not satisfy the required conditions. 

19 . 25 . Consider the use of sufficient estimators in the general case. If t x is sufficient 
for 0 X we have 

L = Li (t l9 Oj) L 2 (x x . . . x nf 0 a . . . 0j). . . . (19.32) 

The locus t x = constant determines a series of hypersurfaces in the sample-space W. If 
we regard these hypersurfaces as determining regions in W , then t x < k, say, determines 
a fixed region K. The probability that E falls in K is then clearly dependent only on 
t x and 0 X . By appropriate choice of k we can determine K so that 

P{E sK 10,}-a, 

and hence set up regions of acceptance based on values of t x . We can do so, moreover, 
in an infinity of ways, according to the values selected for oc 0 and ol x . 


Studentisation 

19.26. In Example 19.1 we considered a simplified problem of estimating the mean 
in samples from a normal population with unit variance. Suppose now that we require 
to determine confidence limits for the mean /i in samples from 


dF — - . . exp 

ay/(2n) 1 


B(^n 


dx. 


The approach of Example 19.1 would lead us to the conclusion that, for confidence coefficient 
0-9545 and contral intervals, 

p {*-^ <ft < * + 7n 1 ^ <r } ==0 ‘ 9546 - 

But we cannot now say that the confidence limits are x ± 2a/ y/n because a is unknown. 


Consider then the distribution of z 
is known to be the “ Student ” form 


x - - fl 


, where s 2 is the sample variance. This 


dr - .-** 

(1 + 2 *)“ 


(Cf. Example 10.6, vol. T, p. 239.) Given a, we can now find z 0 and z„ such that 


and hence 

which is equivalent to 



P{ — z, <z <z 0 } = a, 

P{x — sz 0 < n < x + «Zl} = «• 


Hence we may say that n lies in the range x — sz 0 to x + sz t with confidence coefficient 
a, the range now being independent of either ju, or a. In fact, owing to the symmetry of 
“ Student’s ” distribution, z 0 = z„ but this is an accidental circumstance peculiar to the 
present case. 
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19.27. The possibility of finding confidence intervals in this oase arose from our 
being able to find a statistic », depending only on the parameter under estimate, whose 
distribution did not contain a. A scale parameter can often be eliminated in this way, 
although the resulting distributions are not always easy to handle. If, for instance, we 
have a statistic t which is of degree p in the variables, then t/a p is of degree zero, and its 
distribution must be independent of the scale parameter. When a statistic is reduced 
to independence of the scale in this way it is said to be “ studentised,” after “ Student ” 
<W. S. Gosset), who was the first to perceive the significance of the process. 

19.28. It is interesting to consider the relation between the studentised mean- 
statistic and confidence zones based on sufficient estimators in the normal case. The 
distribution of means and variances in normal samples is 

■ (19SS > 

and x, a are jointly sufficient for p, a. In the sample space W the regions of constant x 
are hyperplanes and those of constant s are hyperspheres. If we fix x and a the sample- 
point E lies on a hypersphere of (n — 2) dimensions. Choose an area on this hypersphere 
of content a. Then the acceptance region will be obtained by combining all such areas 
for all x and a. 

One such region is seen to be the “ slice ” of the sample-space obtained by rotating 
the hyperplane passing through the origin and the point (1, 1 ... 1) through an angle 
jtx (not because a half-turn of the plane covers the whole space). 

The situation is illustrated for w = 2 in Fig. 19.5. 


F 


Fra. 19.5. 

For any given p' the axis of rotation meets the hyperplane p = p' in the point 

flB ' " LL 

x x = x, = p\ and the hypercones -- = constant in the W space become the plane 
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areas between two straight lines (shaded in the figure). These may be regarded as regions 
of acceptance, and one set is that obtained by rotating a plane about the line x x = x% = (i 

through an angle so as to cut off in any plane [a = /a' an angle — on each side of 

#1 fA = X% — ft* . 

The boundary planes are given by 

x 1 - fi = (x t ~ n) tan - 2^ 

x t - n = (xt - ft) tan 

where ft = tt(1 — a); or, after a little reduction, 

„ = + *» _L - *• P 


X* cot t 


[A then lies in the region of acceptance if 

*■ + *•- !*•-*!cot^ <*• + *• + '*■ -*-J cot L 
2 2 2 r 2 2 2 

These are in fact the limits given by “ Student’s ” distribution for n = 2, since the sample 


variance then becomes 


X x #2 I 


so that 


1 f* dz l(n . x \ 1-oc 
Z .=.ten(2-|)-cot| 


19.29. Tables or diagrams of the confidence intervals for selected values of a have 
been given for the following parameters:— 

(а) the proportion m in the binomial (Clopper and Pearson, 1934); 

(б) the parameter of the Poisson distribution (Garwood, 1936 ; Ricker, 1937) ; 

(c) the correlation coefficient in normal samples (David, 1938a); 

(d) the median in samples from any population (K. R. Nair, 19406). 

In addition, results for the mean of a normal population may be obtained from “ Student’s ” 
integral as shown above. Those for the variance of a normal population may be obtained 
from the jp-function or the equivalent # 2 -integral. For simultaneous estimation of mean 
and variance there are difficulties, as we proceed to show. 


19.30. It might have been expected that the foregoing theory could be generalised 
to give simultaneous pairs of confidence intervals for two unknown parameters when 
intervals for each separately cannot be found. Very little progress in this direction has, 
however, been made. The difficulty may be illustrated by reference to the joint distri- 

A.8.— VOL. II. ° 
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button of mean and variance (19.33). From the independent distributions of $ — fi and 
8 

- we can, given a, /?, find t„, t t and u a , u t such that 

P {*• <tii| = p 

where the f s and u’ s depend only on sample values and a, p may be chosen at will. The 
inequalities are equivalent to 

x — at 0 < n < x + ah .(19.34) 

— <(T <-L.(19.35) 

U t U 0 

and these give 

X~~8<U< i X+ — 8. . . . . (19.30) 

u 0 u x 

But can we then infer that 


P 


{* 


-#<«<* + 

«o 



= V> 


. (19.37) 


where y is a constant dependent on a and P ? We cannot. This equation is, in fact, 
not generally true. The fact can be verified by considering the distribution of the statistic 
x — ks and showing that its distribution function F (u) is not independent of p and a . 


19.31. In the next chapter we shall see that a similar problem, giving rise to Behrens* 
test, provides a crucial point of difference between the theory of confidence intervals and 
that of fiducial intervals. All we need say here is that from the point of view of the former 
the problem of simultaneous confidence intervals for several parameters remains unsolved, 
except of course in the degenerate case when we can find independent intervals for each 
parameter separately. 


19.32. In conclusion we indicate without proof a few results which have recently 
been obtained. 

(1) Wilks and Daly (19396) have generalised the theorem of 19.12 to the case of several 
parameters. Under fairly general conditions the confidence regions which are shortest 
on the average are given by 



f d log L d log L ] 


<xl 


where {a if ) is the inverse matrix to that whose general element is 


E 


/ d log/ d log/\ 

V Mi M } ) 


and Xa 8u °h that P {% % < xt) — *» the probability being calculated from the ^‘-distri¬ 
bution with v = 1. This is clearly related to the result of 17.46 giving the limiting forms 
of variances and covariances of maximum likelihood estimators. 

(2) Wald (1942) has considered the problem of large samples from the point of. view 
of most selective sets (“ shortest ” in Neyman’s sense) and has proved results somewhat 
similar to those of Wilks and Daly. 
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(3) Wald and Wolfowitz (19396, 1941c) and Kolmogoroff (1941) have considered the 
problem of setting confidence limits to the terminals of an unknown frequency-distribution. 

NOTES AND REFERENCES 

When the theory of confidence intervals and that of fiducial intervals were first devel¬ 
oped many statisticians regarded them as equivalent. In papers written between 1930 
and 1938 “ confidence limits ” and “ fiducial limits ” are often used in the same sense; 
and even where a distinction of approach was drawn the results given by the two methods 
appeared identical. The case of Behrens’ test, however, provided an illustration where 
the methods lead to different results—see the following chapter. 

The fiducial approach is due to R. A. Fisher, references being given at the end of 
Chapter 20. The approach of the present chapter has been developed mainly by Neyman 
(see particularly 19376), E. S. Pearson, Wilks (19386, c, 1939a and—with Daly—19396), 
Wald (1939a, 1942), Welch (1939a), and Bartlett (1936a, 1939a). A number of the references 
to Chapters 26 and 27 are also relevant. 

Confidence intervals can be obtained for the median and other quantiles which are 
independent of the form of distribution. See Thompson (1936), Savur (1937a) and K. R. 
Nair (19406), and compare Exercise 19.5. 


EXERCISES 

19.1. Show that for the rectangular population 

dF = §, o < x < e 

v 

and confidence coefficient a, confidence limits for 0 are t and t/y> where t is the sample range 
and y> is given by 

y n_1 { n — (n — 1) ip] = 1 — a. 

(Wilks, 1938c.) 


19.2. Show that, for the distribution of the previous exercise, confidence limits 
for samples of two, x x and x t , are 

%l -f* X% Xi + x% 

1 + 1 -V(f- «)’ 

(Neyman, 19376.) 


19.3. Show also, in the case of the previous exercises, that if L is the larger of a 
sample of two, confidence limits are 

r _ L 

V(i'-«T 

(Neyman, 19376.) 

Show further that if M is the largest of samples of four, confidence limits are 


M, 


M 


(1 - a)*’ 

(For an experimental verification, see Frankel and Kullback, 1940.) 
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19.4. Show that, for the distribution 

dF — 8 e - *® dx, 9 < x < oo 

central confidence limits for large samples with a = 0-95 are given by 


6 = 




(Wilks, 1938c.) 


19.5. If a frequency function is continuous, the probability that the £th of a sample 
of n (arranged in ascending order of magnitude) lies in the range dx is 

_1_ F k ~ l (1 - F) n ~ k dF, 

B(ic,n-lc + 1) * ' ’ 

where F is the distribution function. Deduce that 


P {x k < M < x n _ k+l } = 1 - 2 / 0 . 6 (» — k + 1, fc), 
where M is the median, and hence show how to determine confidence intervals for M from 
the incomplete R-function. 

Generalise the result for quantiles. Show that the results do not hold for discon¬ 
tinuous distributions. 

(Thompson, 1936.) 



CHAPTER 20 

FIDUCIAL INFERENCE 


20.1. We now proceed to examine a type of inference known as fiducial. As in 
other methods of estimation, given a distribution of known form depending on an unknown 
parameter 0, we shall attempt to find, limits between which 0 lies in some sense associated 
with the theory of probability. To that extent our present approach is similar to the 
use of estimators with their associated sampling error and to the use of confidence intervals ; 
but it is distinct from the latter both in essential ideas and in some of the results to which 
it leads. 


20.2. Consider samples of n from a normal population of unknown mean ju and 
unit variance. The sample-mean x is sufficient for fx and its distribution is 


dF = 



( 20 . 1 ) 


In speaking of a distribution in this sense we regard fx as fixed and consider the totality 
of values of x derived by random sampling from the population with given fx. The pro¬ 
portion of samples falling in a range dx is then given by (20.1), which holds for each 
value of fj>. 

We now change our viewpoint and consider a different kind of distribution based on 
(20.1). If we are given a value of x from a sample, what are the values of fi which could 
have given rise to this value to any fixed level of probability ? If the deviation x — fx is 
written as h , we know that the probability of the inequality 


x — [i <*h . . . . . . (20.2) 


being true is a, where a depends on h and is in fact 



(20.3) 


Looking at this the other way round, we may say that given any a we can find A, a function 
of a only, such that 

fx > x — h . . . . . (20.4) 


is true with probability a. For any fixed x this gives us a distribution of fx. Consider 
in fact the equation 

tx=x-h .(20.5) 


If fx has a distribution function F (fx), we have, since (20.4) is true with probability a, 



^whence _ 

f(t*)dp= - J^ex p ( - dh. 

But in virtue of (20.5), dp = — dh and h — p — x. Thus 

fMdp-J^ exp ( - dp. 


( 20 . 6 ) 


This is called the fiducial distribution of p. 
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20.3. It so happens that in this example the non-differential parts of (20.6) and 
(20.1) are the same. This is not essential although it is not infrequent. The oruoial 
point of difference, however, lies in the appearance of the differential element dfi, relating 
to the variation of fi, and the disappearance of dx relating to the variation of x. We have 
derived a distribution of the parameter fi from that of the random variable x by trans¬ 
ferring our attention in (20.4) from x to n and regarding the inequality as still satisfied 
with probability a. 

20.4. We note in the first-place that this distribution is not necessarily existent. 
When we come to make an inference in any particular case we do not assume that p is 
itself distributed in the fiducial form in the sense that it has been chosen at random from 
an existent population of fi ’s of that form. Such a prior distribution, which would be 
required for the application of Bayes’ theorem, is not admissible from the point of view 
of the frequency theory of probability. The fiducial distribution is a hypothetical one of 
conceivable values of fi. We attach probabilities to these values, or rather to values in the 
range d/i, by identifying them with the probabilities (based on frequency) which are derived 
from the distribution of a sufficient estimator of /*. For this reason the fiducial distribution 
is not a frequency-distribution in the ordinary sense ; but it ia a probability distribution 
in its own special sense. We use it to make statements of the kind: among the values 
of fi which are possible, only those in a certain range give rise to the observed x with 
probability a, and hence we will locate /i in that range. 

20.5. In our present example the argument would proceed as follows. From equation 
(20.6) and the use of the normal integral, the probability that fi — x does not exceed a 
certain h is ascertainable as a function of h; for instance, 

- x < 44 = 0-9775. 
y/nj 

If we regard a probability as high as this as acceptable, we may say that /i <x + 2 /y/n. 

This result is equivalent to that given by the theory of confidence intervals, for if 
we assert fi < x + 2/ y/n we shall be right in the long run in 97-75 per cent, of the cases. This 
identity of result is found in most elementary cases where a single parameter is concerned, 
but is to be regarded as accidental. In the theory of confidence intervals it is fundamental 
(a) that the assertion as to the parameter lying in a given range should be true in an assigned 
proportion a of the cases, and (6) that no assumption need be made as to the prior dis¬ 
tribution of the parameter, either in the frequency sense or in the fiducial sense. In fiducial 
theory it is not necessary that (a) should be true, but the fiducial distribution is 
a fundamental part of the inference. 

20.6. There is a further distinction between the two theories. In that of confidence 
intervals it is possible to have two entirely different sets for the same parameter, and in 
fact part of that theory is devoted to finding “ best ” sets among the possible ones. In 
fiducial theory such a state of affairs must not be possible, for different limits would imply 
different fiducial distributions for the same parameter on the same evidence. This is avoided 
by confining fiducial distributions to those based on sufficient estimators, or more generally 
on a set of estimators which together avoid all loss of information. Since such estimators 
alone contain all the information relevant to the problem of estimation they alone can 
give, the fiducial distributions accurately. It follows, of course, that where no sufficient 
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estimator or estimator with complete set of ancillary estimators—can be found, the 
fiducial method is inapplicable. 


20.7. Generally, let F (0, t) be the distribution function of a sufficient estimator t 
for a parameter 0. Then for the frequency distribution of t we have 

dF = ~~ dt .(20.7) 


F (t, 0) is the probability that a random value of the estimator does not exceed a given 
value t. In accordance with the fiducial principle, this may be equated to the probability 
that for fixed t the value of 0 will exceed t, so that for the fiducial distribution of 0 we have 


dF=±{l-F(t, 0) }<f0 


^ _ dF (t, 0) 
00 


dO. 


( 20 . 8 ) 


This shows the general relation between the frequency-distribution of the estimator and 
the fiducial distribution of the parameter. 


Example 20.1 

If p is known, the estimator § = ^ is sufficient for 0 in samples from 


x . 
V 

x v~ \ e -a-/e 


dF = ~ - dx, 
0»r(p) ’ 


0 < x <oo 


the distribution of 0 being, in fact, 

dF = (” P -) nP gnP_1 exp/ npS 
\0j F (np) 1 

(Cf. Example 17.8.) We may write this in the form 


(-=s> 


Mf) 

It is then clear that, since 


np 


( npO\ 

•" xp (;•*), 


r (np) 


(f-> • 


_ dF _ dF dt 

W ~ dt 00' 

the corresponding fiducial distribution of 0 is 


dF = 


/ _ npO \ 

\o) F (np) P 0*’ 


(20.9) 


( 20 . 10 ) 


which may also be put in the form (20.9), provided that we interpret the differential element 
now as relating to 0 and not to 6. It will be noticed that we have replaced dM by § 


not merely by dO. 

From the fiducial distribution (20.10) we can find the probability that 0 lies in a certain 
range dependent on the observed 6 and the chosen probability a. This is in fact the same 
range that we should obtain by applying confidence intervals to (20.9). Once again the 
results of the two methods are the same. 
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Fiducial Inference based on “ Student's ” Distribution 

20.8. Consider now the estimation of the mean p in samples from a normal popula¬ 
tion with unknown variance o*. The treatment of 20.2 is no longer of use, for it would 
result in a fiducial distribution of p containing the unknown a. We therefore “ studentise ” 
the problem by considering the distribution of 

t == .( 20 . 11 ) 

8 


which is independent of or, being in fact 

dF oc 


dt 


fs\M’+D’ 


where v = n — 1. Here s' 2 is the unbiassed estimate of the sample variance 

t. 

1 


n — 1 


(* + ?r 

used estii 
E (x — x) 2 . 


( 20 . 12 ) 


The distribution of t may be written 

dF oc 

The fiducial distribution is then 

dF oc 


d{t-Ji V n} 

j ( x - p)*n 1 

l «'*(»-1)J 

djbt 

f, . ( /- 

\ -1)1 


. (20.13) 


. (20.14) 


In the usual way we can find two constants, for any given a, such that, from (20.14), 

P{ju 0 <ju </Xi} = a,.(20.16) 

the probability being based on (20.14) and therefore to be understood in the fiducial sense. 
Had we worked with (20.12) or (20.13) we should have found t u t 0 such that 

X P{ - t x <t <t 0 } = a, 

which is equivalent to 

P \x — <fj, <x + = a. . . . (20.16) 

[ y/n y/n J 


This may be interpreted in the sense of confidence intervals, i.e. that in asserting the 
inequality in (20.16) we should be right in a proportion a of the cases in the long 
run. (20.15) does not rest on this statement as to frequency, though the limits to which 
it leads are the same and the statement happens to be true. 


20.9. The case we have just discussed raises a new point. Is it still true that 
the fiducial distribution is unique, and is it consistent with the distributions of \x and a 
separately ? The distribution is based only on the sufficient estimators x and s' (which 
are jointly but not separately sufficient for fx and a) and we should expect this to be so. 
Btft the matter requires investigation, for we are here using a fiducial distribution based on 
two estimators. 
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The simultaneous distribution of x and s' is 

dF oc 1 exp | - JL (f - ,,)*} d£ (tj- 2 exp j - } * ■ ‘ (2017> 

If we were considering fiducial limits for ju with known a we should use the distribution 

dF * \ ® Xp { ~~ 2o* ^ ~ d£ ' 

If we were considering fiducial limits for a with known fi we should not use the other factor 
in (20.17), 

• • ■ (2oi8) 

for in such circumstances s' is not sufficient for or, the appropriate estimator being 
- £ (x — ju) 2 . The question is, what form of fiducial distribution must hold for a in order 


that the “ Student ” form (20.14) should hold for p when o is unknown ? 

Suppose the fiducial distribution is / (s', a) do. We have then for the joint fiducial 
distribution of // and o, 


dF oc —exp 

<i 


(x - nY 


We have therefore to solve 


1 1 dfi f (s', o) da. 

u: ;“ p {" ^ *>*} • “’'• c) 

l + «'*(«-!)/ 


(20.19) 


n 


where k is some constant. Putting (u — x) 2 — a, — = /?, we have then to solve 

&o 


»)?- 


1 + 


not 


(n — 1) s 


■■I 5 


Regarding a as the complex quantity it we see that ^ is the frequency 

function whose characteristic function is l^|l + ^ gi ves 

from which we find 

/(«'. o) OC I exp | 
or, on evaluation of the constant, 


(n — 1) 8‘ 
2o* 


"}■ 


f {.■,,) d* 


<» - osp f _ jf. 


2<t* 


( 20 . 20 ) 


This, th e", is the fiducial distribution which a must obey. We should have arriyed at 




90 


FIDUCIAL INFERENCE 


the same result had we taken (20.18) and transformed it to the fiducial form, as if it related 
to a’ and o only and the former were sufficient for the latter. 

It appears, then, that in this case at least the fiducial method gives consistent results 
when two parameters are involved. The general problem of many parameters presents 
difficulties and has not been elucidated to any. great extent. 

The Logic of Fiducial Inference 

20.10. The notion of fiducial probability was introduced by Fisher. (1930) for the 
case of a single parameter. Regarding the estimate t as fixed, Fisher considers the dis¬ 
tribution of values of 0 for which t can be regarded as a representative estimate—representa¬ 
tive, that is to say, in the sense that it could have arisen by random sampling from the 
population specified by 0. As pointed out above, this does not mean that we are regarding 
the true value of 0 as a member of an existing population. Rather, we are considering the 
possible values of 0 and attaching to each value a measure of our confidence in it, based 
on the probability that it could have given rise to the observed t. 

If I interpret him correctly, FiBher would regard a fiducial distribution as a frequency- 
distribution. This implies that 6 is regarded as a random variable. It appears to me, 
however, that it is not a random variable in the ordinary sense of the frequency theory 
of probability, in which values of 0 either are or can be generated by an actual sampling 
process. We oan never test whether the fiducial distribution holds in the frequency sense 
by drawing a number of values and comparing observation with theory. Nor, in calcu¬ 
lating fiducial limits of the type 6 — t + h (a), do we imply that the proportion of cases 
for which 8 < t + h is true will be a in the long run. 

20.11. The reader has a choice of several attitudes towards the foundations of the 
fiducial argument: (a) he can accept the argument as involving a new postulate of infer¬ 
ence ; (6) he can regard it as sanctioned by the approach of the previous section; or (c) he 
can, so far as estimates based on a single parameter are concerned, console himself with 
the thought that the results of the process are the same as those given by the theory of 
confidence intervals. 

20.12. Although Fisher is careful to emphasise the distinction between his own 
approach and that based on Bayes’ postulate, it is interesting to note that the theory of 
inverse probability as modified by Jeffreys gives results which are in many cases identical 
with those of fiducial inference. 

In the example of 20.2, for instance, suppose that the prior distribution of piaf(p) dp. 
Then for any given x the posterior probability of p is 

dF =f(p) dp JI. «,{-•(.- p)*}. . . . (20.21) 

If the total probability is unity we have 

j / M exp | | “ /“)*} dp = 1. . . . . (20.22) 

Clearly/(/x) = 1 is a solution, and we may use characteristic functions to show that it is 
the only solution. In fact we have from (20.22), writing it for nx — 
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The expression on the right is the characteristic function of exp ( — 1 ~- and henoe 

/w- »(-¥)-«*»(-¥> 

or/(Ai) = l. 

We have, then, for the posterior probability distribution of //, 


^ exp { “ 1 ^ ~ 


. (20.23) 


which is the same as the fiducial distribution. The requirement that/ (fi) = 1 is equivalent 
to a prior distribution of p, dF — dp, which is the form given by Bayes’ postulate for a 
parameter which can extend to infinity in either direction. 

Example 20.2 

In Example 20.1, a similar argument leads to a prior distribution of 0, 

jn I® 

df CCj. 

This is the form given by Jeffreys’ modification of Bayes’ postulate when a parameter 
can extend to infinity in only one direction. 

It does not appear, however, that fiducial and inverse probability always give the 
same results. Consider the distribution of the correlation coefficient in normal samples 
(14.14)-- 

ip ^ n \~rr- n _ d n 2 f cos 1 (— pr) 1 , /<>n 


n — 1 n—4 f]n~ 2 

dF x (l - p')~ (1 - r')~ 


2 f cos -1 

“- 2 1 V(1 - 


(20.24) 


The argument of the type we have just employed would require a prior distribution of p — 

and the resulting posterior distribution (which is equivalent to that obtained by inter¬ 
changing r and p in (20.24)) is not the same as we should get by using equation (20.8). 

Behrens' Test 

20.13. Suppose we have two samples of n t and n % members from normal populations 
with possibly unequal variances. The fiducial distributions of and p* are of the 
“ Student ” form (20.14). Writing 

1*1 = Xi + $J Ui 

fJ >2 rz= *^2 ~ f ” ^2 w 2 

we have 

M >1 — f *2 = &i ~ + S 1 ~ s 2 u 2 .(20.25) 

If now 

* - - . <20 - 26 > 

e depends only on the known quantities x and s', and the difference of means p^ — p t . 
From the fiducial distributions of p x and p 2 we can find that of e, and hence make fiducial 
statements of the type 

x x —x t — e» + s 2 *) < Pi — Pt < — *» + «i V(«i* + «a*)- • (90-27) 


(20.26) 
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The distribution'of e is not of a simple form. Putting tan ip 


e 



cos ip — 



sin ip. 


^ we see that 

s i 


. (20.28) 


so that e is distributed fiducially as the weighted difference of two variables, each of which 
is distributed as “ Student’s ” t. We have then to find the distribution of 


e = ti cos ip — t 2 sin ip 

where the joint distribution of t x and t 2 is given by 


dF oc 



(20.29) 


The distribution has been studied by Sukhatme (19386) and in more detail by Fisher 
(1941a). Tables are given for various values of n u n 2 and the ratio s[ 2 /s' 2 2 (or the equiva¬ 
lent angle ip) showing the values of e corresponding to given probability levels. Some of 
the tables are included in the second (1943) edition of Fisher and Yates’ Statistical Tables for 
Agricultural , Biological and Medical Research. 


20.15. The joint distribution of s[ 2 and is 

dF oc exp | — \ (»j — 1) ^ —£(», — 1) da]* da]*. 


Putting 


*'2 


p = and 


«-*{ 


(w, 


St. 2 




*>4 


¥}■ 


we find, on a little reduction, 
dF oc 


(r»i-3) 


dp 


f p (% - 1 ) + » «-! ]* 
1 <£” J 


(«! + **-2) 


iflOh +n, -4) g-w 


(20.30) 


Thus u is distributed (independently of p) in the Type III form. Further, 
(*x — fjL x ) — (x 2 — ju 2 ) is distributed normally about zero mean with variance a\ + a\. 

Cf 2 

Hence, if ~ — 0, we find that the quotient 


{(*i ~ Pi) — (»« - /*.)} * (»i + »„ - 2) _ b* (1 + p) («i + »t — 2) 


W + <« {(», - i) + (». - i)f} (i + o) 


(20.31) 


is distributed as t 2 with n x + n 2 — 2 degrees of freedom. (Cf. Example 10.17, vol. I, 
p. 248, for the distribution of a normal variate divided by a Type III variate.) 

Now if we knew 0 we could find fiducial (or confidence) limits to e, and hence to p x — p 2f 
in the usual way, for the distribution of e would then be independent of unknown constants 
and ascertainable from “ Student’s ” integral. Since, however, 0 is not known, we require 
in turn the fiducial distribution of this quantity. Since 

, = t log 

is distributed in Fisher’s form (cf. Example 10.18, vol. I, p. 249), the required fiducial 
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form for 0 can be obtained from that of z , which incidentally is equivalent to that of p 
in (20.30). If we express (20.31) as the joint fiducial distribution of e and 0 and integrate 
out for 0, we shall be left with an equivalent form to that derived from (20.29). 

20.16. It also follows from the above that the inequality (20.27) is not satisfied in 
proportion a of the cases independently of 0, so that the limits to — fi % are not confidence 
limits, although they are fiducial limits. It will, in fact, be evident enough from (20.31) 
that if we determine t 0 and t x so that the integral of “ Student’s ” form between those 
limits is a, then the corresponding limits for e, say e 0 and e l9 are dependent on the variance 
ratio 0 = This is fairly evident on general grounds, and the point has been put 

beyond doubt by both Fisher (19376) and Neyman (1941a), who have worked out particular 
cases of difference. 

The fiducial distribution of e (which is an extension by Fisher of a result given by 
Behrens as early as 1929) thus provides a crucial point of difference between the theory of 
fiducial inference and that of confidence intervals. 


20.17. In conclusion, we will indicate the viewpoint of Jeffreys towards the type of 
problem dealt with by “ Student’s ” distribution for limits to the mean and Behrens’ 
distribution for limits to the difference of two means. 

If H denotes the general data, we have for the “ Student ” distribution— 


P {dt | /i, a, II } 


k dt 




(V+1) 


(20.32) 


The expression on the left states the probability that t will lie in a given range dt on the 
assumption that H is true, the parent mean being fi and the parent variance a*. Since 
fi and a do not appear on the right they are irrelevant and may be suppressed, and hence 


P{dt\H} 


k dt 



i(.+ i) ■ 


Suppose now that we assume that 

P{dt\x, s, H } = / (t) dt. 


(20.33) 


(20.34) 


Then, as before, x and s may be suppressed and we have 


P{dt\H\ =f(t)dt, 


and hence, by comparison with (20.33), 

P { dt | x, 8, H } 


k dt 





. (20.35) 
. (20.36) 


We can then proceed to find limits to t, given x and s, in the usual way. Jeffreys empha¬ 
sises, however, that this depends on a new postulate expressed by (20.34) which, though 
natural, is not trivial. It amounts to an assumption that if we are comparing different 
distributions, samples from which give different x’s and «’s, the scale of the distribution 
of ft must be taken proportional to a and its mean displaced by the difference of sample 

means. 
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20 . 18 . In a similar way it will be found that to arrive at the Behrens distribution 
it is necessary to postulate that 

P {dti, dt% | x lt x Zf 3 V s af ff} =/x (ft) dti dt z . . . (20.37) 

Jeffreys* derivation of the Behrens* form from Bayes* theorem would be as follows:— 
The prior probability of dfi x dfi z da x do % | H is 

P {d/i, dfi % da, da i \ H} oc d J^l da L d S* . 

O x O z 

The likelihood (denoting the data by D) is 

P{D\n x ,p t , a„ a„ H } oc -~ t exp - *,)* + af} - - *•)* + <8}]. 

Hence, by Bayes* theorem 

P{d Ml dfi t da, da, | DH} = —j L-* exp [- - *,)■ + 4} 

{ ( a *» — *«) ! + 4 } j d/t, d/x, da, da,. 

Integrating out the values of a x and a 2 , we find for the posterior distribution of jli 1 and 
a form which is easily reducible to (20.29). 

20 . 19 . To sum up : so far as concerns problems of estimation the Behrens test is 
accurate both in fiducial theory and in the theory of probability propounded by Jeffreys. 
But the test does not hold in the theory of confidence intervals. In fact the latter fails 
to provide an exact solution to the problem, though we shall see below ( 21 . 28 ) that approxi¬ 
mations are possible. Fisher has criticised confidence intervals on the ground that they 
do not give an answer to what is admittedly an important question ; but it appears possible 
to maintain consistently that some questions may not have an answer. 

NOTES AND REFERENCES 

For the general theory of fiducial inference see Fisher (1930a, 1933, 1935a, 6, 1936c, 
1941a). The difficulties of reconciling Behrens’ test with confidence-interval theory were 
noticed by Bartlett (1936a) and led to some controversy, for which see Fisher (19376, 
1939a, 1940c), Bartlett (1939a), Yates (1939/), and Neyman (1941a). For Jeffreys’ views 
see his papers of 19376, 1938c, 1939<2 and 1940. 

For the practical application of Behrens’ distribution see Sukhatme (19386) and Fisher 
(1941a). Behrens himself stated his results explicitly only for the case of equality of sample 
number, n x = n 2i the extension being given by Fisher (19356). 

EXERCISES 

20 . 1 . If x is the mean of a sample of n values from 

dF = —exp / — \ dx, 

oy/(2n) * \ 2o 2 J ’ 

s'* is equal to - E (x — x) 2 , and a? is a farther independent sample value, show that 

4 ___ x — x / n 

“ V iT+l 
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is distributed in “ Student’s ” form with v = n — 1. Hence show that fiducial limits 
for x are 

* ± 

where t x is chosen so that the integral of “ Student’s ” form between — t t and t x is an 
assigned probability a. 

(Fisher, 19356. This gives an estimate of the next value when n values have 
already been chosen, and extends the idea of fiducial limits from parameters 
to variates dependent on them.) 


20.2. Show similarly that if a sample of n x values gives mean x x and estimated variance 
the fiducial distribution of mean x 2 and estimated variance a' 2 in a second sample of n t is 


( n x — 1) 5 t 2 {n 2 — 1) $ 2 2 (*®i *^a) 2 

Hence, allowing n 2 to tend to infinity, derive the simultaneous fiducial distribution of 


J n x n % 1 
V n x + n 2 J 


4(Wt+tlt —1) 


dF oc 


{ 


H and a. 


(Fisher, 19366.) 



CHAPTER 21 

SOME COMMON TESTS OF SIGNIFICANCE 


Tests of Significance 

i 21.1. We now pass from the problem of estimation to that of significance. The 
two are closely allied and in practical problems they both arise together as a rule; but 
it is useful to preserve a distinction between them. In estimation we try to find, with 
greater or less accuracy, the value of some parameter in a population which is known to 
be (or assumed to be) dependent on that parameter. In tests of significance we are given 
some value of a parameter beforehand and wish to decide whether it is acceptable in the 
light of the evidence. This is the distinction in its simplest terms, but of course the 
associated problems become increasingly complex when several parameters are concerned. 

21.2. From one point of view the problem of significance is logically anterior to that 
of estimation. Suppose we have records of the yields of two varieties of wheat grown 
under similar conditions, and are interested in a comparison of the average yields of the 
two. Our first question is whether the observed mean yields indicate any difference between 
the varieties—a matter of significance. Not until significant differences are established 
does our interest turn to the magnitvde of the difference—a matter of estimation. Again, 
if we have a set of records of only one variety, our primary problem may be to decide 
whether they are consonant with the hypothesis of normality in the parent population, 
whatever its mean and variance ; and only when this point has been settled affirmatively 
do we proceed to estimate those parameters. 

Nevertheless, we have lost very little by taking the problem of estimation first. In 
some practical problems the question of significance is already decided, and in many others 
we use estimates of parameters to test the significance of the latter, in which case estimation 
And significance become different aspects of the same statistical fact. 

21.3. We shall consider the general theory of testing statistical hypotheses in Chapters 
26 and 27. That theory is, however, rather abstract, and we anticipate it to some extent 
in this chapter by giving an account of the principal tests in current use, without for the 
moment going too deeply into their rationale. It will be seen later that there are sometimes 
many significance tests which can be applied to the same problem, and that it is possible 
to lay down criteria for deciding which, if any, are the “ best ”. This aspect of the subject 
will not concern us for the present. We shall not discuss whether the tests we describe 
are the best possible (though some of them, in fact, are so) but shall merely present them 
as useful and convenient, albeit perhaps not unique, solutions of our problems. 

21.4. Developments in statistical theory in the last two decades have resulted in 
a great many tests of significance appropriate to special problems. It is not easy to classify 
them and quite impossible to deal extensively with them all. We shall consider them 
under the following heads :— 

(a) Testa of the significance of a specified parameter value. —The typical hypothesis 
here is that a parameter in a population of known form has a specified value (usually 
. zero). We wish to know whether the evidence provided by the sample supports the 
hypothesis or not. 
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(b) Testa of goodness of fit. —The hypothesis is that the population is of a certain 
kind which is either fully specified beforehand or can be “ estimated ” with the help 
of the data. We wish to know whether the sample values fit this population in the 
sense that they could have arisen from it by random sampling to any acceptable degree 
of probability. This hypothesis is more general than that of (a) since it concerns 
the whole distribution function and not merely one of its parameters. 

(c) Tests of homogeneity .—The hypothesis here concerns two or more populations, 
each providing a contribution to the sample. We wish to test whether the populations 
have certain parameters in common, or in the extreme case, whether they are identical. 
This case can be regarded as an elaboration of (a) where several parameters are simul¬ 
taneously tested. In the particular case when only two populations are concerned 
we may sometimes reduce it directly to type (a) by considering differences ; e.g. if 
we are making a comparison of parent means the hypothesis might be that the single 
difference of means is zero. 

In addition we shall also consider two sets of tests of rather a different kind:— 

(d) Tests of order of occurrence. —The hypothesis here is that the sample members 
occurred in random order, and we wish to ascertain whether the observed order indicates 
any systematic effects, as, for instance, whether there are any cyclical effects in time- 
series. The test here is of the sampling process rather than of parameters of the 
parent population. 

(e) Conditional tests. —The hypothesis may be any one of the above types, but 
we restrict the inference to a sub-population for which certain qualities are deter¬ 
mined by the observed sample values. For instance, we may use the distribution 
of the sample variance s 2 for which the mean x is equal to the observed value. In 
short the variation of sample values is conditioned. Type (d) may from some points 
of view be regarded as a particular case of this type. 

It is not intended to convey that the above five categories are mutually exclusive. 
A test of type (a) may, for example, be conditional or non-conditional. The classification 
will, however, provide some sort of articulation for a rather long chapter and serve to 
explain our sequence of treatment. 

Standard Errors 

21.5. For large samples the test of significance of a parameter can usually be carried 
out by standard errors. We find an estimator t of the parameter 0 and consider whether 
the given value of 0 falls in the range t x ± k var t, where t x is the value of t for the observed 
sample and k is a constant chosen at will according to a probability a. If so we may accept 
the value of 0, at least so far as this test is concerned ; if not, we reject it. 

If the variance of t does not depend on unknown quantities such as other parameters, 
this type of inference is justifiable as an application of the theory of confidence intervals. 
In accepting 0 when it falls in the range t x ± ky/\&v t 9 we shall be right in proportion a of 
the cases in the long run. As a refinement we may, of course, use non-central intervals 
and locate 0 in an asymmetrical range t x — & 0 V var t to t x + k x y/\ ar t. The test of signifi¬ 
cance is equivalent to the estimation of the true value of 0 ; and it will clearly be better 
if the range of estimation is narrower, for then we reject more wrong values of 0. 

21.6. If the variance of the estimator t depends on unknown parameters 0 t ... 6 p 
we can usually substitute estimates of those parameters obtained from the sample itself, 

a.s.—vol. n. H 
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provided that the sample is large. For example, we have for normal samples 

The sample standard deviation a will differ from a by a quantity of order 1 /y/n, so that 
to that order 

P^H <x + = 0-97725. 

The approximation breaks down for small samples, and more accurate methods are required. 

21.7. The use of standard errors in testing significance has been illustrated in previous 
chapters, and we need not enlarge on the process further. We may, however, remark 
two things:— 

(а) That if the distribution of an estimator t tends to normality for large samples 
irrespective of the parent form (as, for instance, is the case with the mean and other moments 
under very general conditions), it is not necessary that the hypothesis should specify the 
parent form. In short, our test of significance is independent of the parent, a valuable 
generality which rarely obtains for small samples. 

(б) That we have justified the logic of reasoning involving the use of standard errors 
by the theory of confidence intervals (and a similar justification can be given in terms 
of fiducial intervals if we use an efficient estimator for which the loss of information tends 
to zero relative to the total information in large samples). This appears to be the most 
satisfactory basis for the use of standard errors. The usual intuitive basis advanced 
(necessarily) in introductory textbooks is not easy to defend. For instance, it is customary 
to reject a value of 0 if it gives to an observed t x or greater value a small probability ; and 
there is no obvious reason why we should base our inference on the improbability of greater 
values of t u namely on the improbability of something which has not occurred (see 21.55 
below). Our present approach shows that in fact the use of standard errors can be justified 
logically without invoking a new principle of inference. 


Significance of the Mean in Normal Samples 

21.8. Suppose we have a sample from a parent population which is known to be 
normal, but of whose mean and variance we are ignorant. We wish to test the significance 
of a given value of the mean, that is to say, we wish to consider whether the observations 
could, to any acceptable probability, have been derived from a population with mean 
whatever the variance may be. 

We calculate the statistic 

t = 1V*-, .(21.1) 

s 


all the quantities in which are given. We know that the distribution of l is 


dF 


r (’4 i ) * 

v Hj)( i+ 'F 


( 21 . 2 ) 


and hence can find the probability that our calculated value of t is attained or exceeded. 
If this is small we reject ; if not, we accept it. What values are regarded as “ small ” 
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for this purpose is a matter of convention, but the most frequently used values are 0*05, 
0*01 and 0*001. 

From the work of the previous two chapters it will be evident that this type of infer¬ 
ence is the confidence- or fiduoial-interval approach in a slightly different form. Given 
a we can find — t x and t 0 such that the integral of dF in (21.2) between those limits is a. 


t 0 s 


This gives us confidence or fiducial limits to p of the type x — — and x 


and if 


^ Vv' 

/* 0 lies in this range we accept it. In particular cases we may have t 0 = t u in which cases 
the intervals are central and our probability a is the chance of t being attained or exceeded 
in absolute value ; or t 0 = -f oo, in which case a is the chance that — t x will be attained 
or exceeded, and no lower limit to /u 0 is imposed. 


Example 21.1 

The weights of fifteen bags of sugar taken from a filling machine are found to be, in 
ounces, 16*1, 15*8, 15*8, 15*9, 16-1, 16*2, 16*0, 15*9, 16*0, 15*7, 15*7, 15*8, 16*0, 16*0, 15*8. 
Each bag should be 16 ounces, but some deviation is inevitable. One of the manufac¬ 
turer’s problems, of course, is to keep this deviation to a minimum, but that is not the 
point we now consider. Our question is : if the machine is supposed to be giving weights 
of 16 ounces on the average, does the sample suggest that it is failing in its purpose ? 

The hypothesis is that the parent mean is 16 ounces and the deviations from this 
mean are, in order of magnitude, — 0*3 (twice), — 0*2 (four times), — 0*1 (twice), 0*0 
(four times), 0-1 (twice), 0*2 (once). The sample mean is thus — 0 08 and to that extent 
the average of the sample is slightly underweight. Is this a significant effect ? 

It will be found that s 2 — 0*0216 so that 


. __ 0*08 _ 
VO-O^ie^ 


2 04, 


= 14. 


From Appendix Table 3 (vol. I, p. 440) we find that for v — 14 the probability of a deviation 
greater in absolute magnitude than 2*04 is about 2(1*- 0*969) = 0*062. This is small, 
but whether we regard it as significant or not depends on the probabilities we are prepared 
to consider as defining significance. The usual values are 0*05 and 0*01, and with such 
criteria we should not take the observed value as significant, though it arouses suspicions. 

We have here used central intervals, which are usual for the J-test of significance 
of the mean; but it is easy to imagine circumstances in this particular case for which 
non-central intervals might be required. For instance, if the machine was at fault and 
had a true mean filling weight of more than 16 ounces the manufacturer would be giving 
sugar away for nothing. This might be serious, but probably not so serious as if the 
machine was erring in the other direction, which would redder him liable to prosecution 
for selling short weight. Suppose he assessed the latter risk as nine times as serious as 
the former and was working to a probability level of 0*05. Then he would require 
the probability of a negative value of t greater than the significance value to be 
0*955 (= 1 — 0*045) but could allow that of a positive value less than the significance value 
to be 0*995 (= 1 — 0*005). From Appendix Table 3 we see that this corresponds to 
deviations of approximately — 1*8 and + 3*0. Our observed value is outside this range 
and is thus significant. Small as the average shortage is, it would be prudent to overhaul 
the machine and to make sure that it is giving fair weight on the average. 

We may note further that if the sample had occurred in the order 

15*7, 15*7, 15*8, 15*8, 15*8, 15*8, 15*9, 15*9, 16*0, 16*0, 16*0, 16*0, 16-1, 16*1, 16*2 
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we should almost oertainly have concluded that there was something wrong with the 
machine, for the weights are steadily rising. The 2-test would give the same result for 
this sample as for the first, since it does not depend on the order of occurrence of the mem¬ 
bers. Where, .therefore, the appearance of individual sample members is ordered in time, 
the 2-test alone may fail to reveal significant effects due to the changing of the population 
between drawings. Our data are still such as could have arisen at a single drawing of 
fifteen members' from a population with mean equal to 16 ounces *, but the data throw 
doubt on the point whether we are really asking the right question in assuming that they 
all came from the same population. We consider the point again below (21.41). 

Before leaving this example, we may note another possible test, cruder than the 2-test 
but sometimes useful. If the parent mean were really zero, positive and negative devia¬ 
tions should occur equally frequently in the long run. In our present case there are 8 
negative deviations, 3 positive ones and 4 zero. If we allot, conventionally, two of the 
last to each group we have 10 negative and 5 positive deviations. The expected number 
is 7$, so that the deviation is 2£, with a standard error of V(15 x J x J) = 1*94. The 
observed deviation is very little in excess of this, so we conclude that the preponderance 
of negative signs in the sample is not significant of a negative mean in the population. 
More exactly, we find that the occurrence of 6 or fewer positive deviations is the sum of 
the first six terms in the binomial (£ + \) 16 , namely 0*151, leading to the same conclusion. 
The test is a very rough one since it pays no attention to the magnitude of the deviations ; 
but it has the advantage of applying to any symmetrical form of parent population for 
finite samples. 


Properties of the t-Distribution 

21.9. “ Student’s ” distribution has numerous applications in the testing of signifi¬ 
cance apart from the one just considered, and we proceed to study its properties. 

The form (21.2) is a Pearson Type VII and may be transformed to the Beta-distribution 
(Type I) by the substitution f = ly^l+^. The distribution function of 2 may thus 
be obtained direct from the B-function. For instance, we have 


whence 


whence 




l^r) 

vM/’(r)(i+e)T 


2 F - 


_ 2 f* 1 dt 

• • • 


(21.3) 
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The values of the argument for which I has the values 0*50, 0*25, 0*10, 0*05, 0*025, 0*01, 
0*005 and v = 1 (1) 30, 40, 60, 120, oo, have been tabled to five significant figures by C. M. 
Thompson and others (1941a) and can hence be used to derive the values of t corresponding 
to those probability levels. 


21.10. Except for special purposes, however, the use of the B-function is unnecessary, 
since the distribution function of t itself and tables based thereon are available. 

We have 

(- W 

2v * 


• + 


+ • 


i /, , **\ t* , t* 

'-+Ilog (i +£) <r + U + D (- ^ 


and hence 


Further, from the expansion for log F (1 + x) we find 


+ 


“i rfi) •/-•)- 


I+_i_L 

4v 24v 3 20V 5 


(21.4) 


(21.5) 


Now as v tends to infinity, t tends to the normal form with zero mean and unit variance. 
Writing 

y — —?-— e~W 
J V(2 n) ’ 

we find for the logarithm of the ordinate of (21.2), in descending powers of v, 

log s + A (,. _ 2(. _ 1) _ -L (21* - «<) 4 ji, (31* -«• + !) 


40* 


V* > 4i “-*•>+«*■- - »> 


( 21 . 6 ) 


Taking the exponential and integrating from t to oo, we find 

1 “ F = y +1} + i (3t * - 7 ‘ 4 ~ 5t *~ 3 > * + 3 ^ r .<* 10 - nt * 

+ 14i* + 6< 4 - 3<* - 15) t + 921^q^4 ( 15 < 14 “ 376<1 * + 2225< 10 - 2141* ® 

- 9391® - 213t 4 - 915t* + 945) t + . . . j . . . . (21.7) 

This is the expression, due to Fisher, which was used by “ Student ” himself in calculating 
the distribution function of t given in Appendix Table 3, Vol. 1. For values of v > 18 the 
first four terms of (21.7) give F to an accuracy of about 0*000,005. 


21.11. Tables are also available in the “ inverse ” form, that is to say, giving values 
of t corresponding to specified values of v and F. Such tables may be derived by inter¬ 
polation from the “ Student ” tables or by the normalisation method of 6.32. In work 
involving tests of significance this type of table is perhaps the most convenient, since it 
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enables one to decide without calculation (other than interpolation for values of the 
argument not covered by the tables) whether particular values are significant for chosen 
probability a. The complement of the probability a is spoken of as a level of significance 
and expressed either as a number between 0 and 1 or as a percentage. Similarly the 
corresponding values of t are called significance points, and we may speak, for example, 
of the 5 per cent, value of t, meaning that value for which F is 0*95. 

Fisher and Yates (1938a) give the values of t for v = 1 (1) 30, 40, 60, 120 and oo and 
2 (1 — F) = 0-9 (0*1) 0*1, 0-06, 0*02, 0 01, 0-001. These tables, it should be remembered, 
give the significance points corresponding to twice 1 — F, that is to say the values of t 
such that the proportion of the distribution outside the range ± t is 1 — F. 

21.12. The number v is usually called the number of degrees of freedom of t. This 
is an expression which occurs in other connections, and a few words of explanation are 
desirable. 

It has been seen that the variance of a normal sample is distributed like the sum of 
(n — 1) squares of independent variates (compare Example 10.6, vol. I, p. 238) and gener¬ 
ally, that if there are k linear relations connecting the original variates, the sum of squares 
of the . originals is distributed as the sum of n — k independent normal variates of equal 
variance. Each linear relation reduces the freedom of the variation, as it were, by unity. 
It is thus natural to speak of the number of degrees of freedom, v, of a function such as 
X*, meaning thereby that it is distributed as the sum of squares of v independent 
normal variates with equal variance. The expression only has this natural meaning when 
normal variation is concerned. 

It so happens that the quantity t depends on a parameter v which is convenient for 
tabulating its distribution function and is also the number of degrees of freedom of the 
statistic s® entering into the denominator of t. v may thus, by an extension of the term, 
be called the number of degrees of freedom of t, but this usage does not imply that t is 
distributed as the sum of squares of normal variates. 

Distribution of t in Non-normal Case 

, s 21.13. Part of the price we have to pay for the precision of the tf-test in small samples 
his the assumption of normality in the parent. If the population is not normal we may still, 
i* of course, consider the distribution of “ Student’s ” ratio, which will remain independent 
of the scale parameter; but complications appear because the parameters which express 
the deviation from normality will, in general, appear in the sampling distribution. Further¬ 
more, the distributions of x and s are no longer independent. 

Let us in the first instance prove the last assertion which is due. to Geary (19366), 
in the form: If the mean and variance in samples from a population are independent 
and the population has finite cumulants, it must be normal. 

From 11.13 we have 


«-(21 r ) = ^±*, r > 0. 
w 

If mean and variance are independent, k (21 r ) = 0 and hence k t+2 — 0 for r > 0. Thus 
the population must be normal. It is rather remarkable that we have not had to use 
relations of the type k (2* l’’) = 0, s > 1 in arriving at this result and that we need only 
assume independence for one size of sample. 
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21.14. In the notation of Chapter 11 we write 



k — K 

and expand in terms of powers of ~The method follows that of 11.23 and we 
find for the moments of t about the parent mean, assumed zero, to order v~ 2 
/*i= — (2A, — 2A, + 5A 3 A 4 j 

ft = 1 + - (1 + Af) + 4 ( 3 ~ *4 - 3A > *• + «*§ *•) 

ft = - {^3 + — (210A, - 66A 5 4- 105A, A 4 + 210A|)| ‘ * (2L8) 

ft - 3 + - (9 - A 4 + 14A1) + i (102 - 30A 4 + 24A. 

V V 2 

+ 120A* + 4A fl 132A 3 A 5 - 6A? + 168A* A 4 + 120AJ) 
where A r = *T-. 

If the parent form is symmetrical, cumulants of odd order vanish and we have, to 
order v~ 2 and first order terms in the A’s— 


fl[ = ft 3=0 


' i,2 6 2A 4 

/' 2 = 1 + — + "~~z -=- 

y y * y 2 


V - 1 2A t 
v — 3 v 2 


(21.9) 


18 102 2A 4 30A 4 ^ 3 (v — l) 2 _ 2A 4 __ 304 

,U4 v v 2 v v 2 (v — 3) (v — 6) v r a 

Except for the term in A 4 these are the values of the moments of t in “ Student’s ” dis¬ 
tribution, and it follows that for symmetrical parents which are not excessively lepto- 
or platykurtic we should not expect the f-test to be invalidated. If the parent is skew 
the situation may be different. 


21.15. The general skew case has been considered by E. S. Pearson and Adyanthaya 
(1928, 1929) from the experimental viewpoint and by Bartlett (1935a) and Geary (1936b) 
from the theoretical viewpoint. Various writers have derived exact distributions of t 
in non-normal samples, but the sample numbers are, as a rule, trivially small and the 
results of little practical value. Geary considers the population expressed by the first 
two terms of the Gram-Charlier series— 


dF = -4- { 1 - £ (3* - **) )• e-»*’ dx (21.10) 

y/2n [ 6 J 

and assumes that powers of /c 3 above the first may be neglected. He finds (cf. Exercise 
21.1) that the frequency function of t in this population is equal to the “ Student ” form 
plus a corrective factor 


*8 


6v y/{2n (v + 1)} 


{3v- t*(2v + 1)} 



tdt 

m *(v+4)* 


( 21 . 11 ) 
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The integral of this factor from — oo to — t is 


** 

6 




2v + 1 
v 



( 21 . 12 ) 


giving the correction to be applied. (Geary gives a table for some representative values.) 
This, of course, depends on k 99 but even where exact knowledge of the skewness is not 
available we may sometimes safeguard against error by considering the correction for 
plausible values of * 8 . 


Other Uses of the t-distributiori 

21.16. The usefulness of “ Student's ” t derives from the fact that it is independent 
of the scale parameter, and the simplicity of its distribution from the fact that it is the 
ratio of two independent variates, the numerator distributed normally and the denominator 
distributed in the Type III form. We shall see below (21.26) that these properties can 
be used to test the difference of two means in normal populations with equal variance, 
and in Chapter 22 we shall encounter a test of regression coefficients which is based on 
the same properties. 

We have also noted that " Student's ” form can be used to test the significance of the 
product-moment correlation (14.15) and the Spearman rank correlation p (16.18). These 
facts are, however, in a sense accidental. They do not derive from the expression of the 
parameters concerned as the ratio of a normal to a Type III variate, but from the simpler 
fact that the distributions are of the Type II form (symmetrical with finite range) and 
hence can be transformed to the “ Student ” distribution, which is of Type VII. Sym¬ 
metrical distributions of finite range can often be represented very approximately by a 
transformation to the “ Student ” form, especially if they tend to normality. 


Test of a Variance in Normal Samples 

21.17. The distribution of the sample variance s 2 in normal samples is 


dF (w***-” /« a y (n " 3) 

V(^)W 



0 < s < oo. . (21.13) 


Thus, given for consideration a value of a 2 and an observed s 2 , we can find the probability 
that s 2 /o 2 is attained or exceeded and accept or reject or 2 in the usual way. The distri¬ 
bution function of (21.13) may be expressed as an incomplete jT-function, or more con¬ 
veniently for statistical purposes in terms of y 2 (== ns 2 /a 2 ) with v = n — 1. 


Example 21.2 

In Example 21.1 we found s 2 = 0*0216, v = 14. Could the data have arisen by chance 
from a population in which the true variance is 0*01 ? 
ns 2 

We have % 2 = — = 32*4, v = 14. From the diagram on p. 446 of vol. I we see 

that the probability of such a value or greater is between 0*01 and 0*001, a very improbable 
result; and hence we reject a 2 = 0*01 as a value of the parent variance. 

Once again this type of inference can be justified by the theory of confidence intervals 
since the probability 

> 32 4 j < 0 01 
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is equivalent to 

p {°' sfi}< 001 - 

In asserting that a 2 was less than ns 2 / 32*4 (in our present case 0*01) we should be wrong 
more than 99 times in 100 on the average. 

There is a point of interest to note here. In Example 21.1 we considered a hypothesis 
as to the mean jli } and in the present example a hypothesis as to the variance a 2 . Had we 
considered the two together, that is to say the compound hypothesis that p = 16 and 
or 2 = 0*01, we should have been in difficulties in justifying our procedure by reference to 
confidence or fiducial intervals, since we could no longer assert that our conclusions were 
right in an assigned proportion of cases. We have avoided this complication by con¬ 
sidering separately the hypotheses (a) that /i = 16 whatever the variance , and ( b) that 
a 2 = 0*01 whatever the mean . This resource is not as a rule open to us where non-normal 
variation is concerned. 

Tests of Normality 

21.18. In large samples we can group the data into ranges and compare the actual 
frequencies with those to be expected on the hypothesis of parent normality. This com¬ 
parison over the course of the frequency function is not satisfactory for small samples 
unless the grouping is so broad as to deprive the test of most of its efficacy. An alter¬ 
native is to compute some statistic of the sample and to examine how far it departs from 
the mean value to be expected on the hypothesis of parent normality. 

Consider, for instance, the statistic 

.(21.14) 

lc 2 " 

This is independent of the mean (because the fc-statistics are so) and is also independent 
of the scale parameter because it is “ studentised In normal samples, therefore, the 
distribution of t is independent of mean arid variance and thus depends only on the sample 
number n. We have already given formulae for its mean and variance (Exercise 11.16, 
vol. I, p. 289). In fact, 

th (0 = IH (t) = 0 1 

u(t)= 6n (n — 1) \ . . . .(21.15) 

^ U (n — 2) (n + 1} \n + 3) J 

Since the distribution of t is symmetrical we may, for moderate n, consider it as normally 
distributed with zero mean and variance given by (21.15), and this will provide a test— 
of a somewhat approximate kind—of normality in the parent from which the sample is 
derived. 

Example 21.3 

In the data of Examples 21.1 and 21.2 we have, for the sample moments about origin 
16, in units of 0*1 

— — 0-8 
m 2 — 216 
m z — 0*496 
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whence 


jfe, = —to, = 2-31429 

71 — 1 




n 1 

(n - 1) («T-~2) 


tw-s = 0*61319 


and t = hr = 0-174. 

kj 

The variance of t, from (21.15), is 0-3188 and its standard error accordingly about 
0-57. The observed deviation from zero is considerably less than this, and we see no reason 
to doubt the hypothesis of normality so far as this test is concerned. 


21.19. Another test of normality has been proposed by Geary (1935a), namely 
the use of the ratio 


w — 


mean deviation 
standard deviation' 


. (21.16) 


If the parent mean is zero, the parent value of w is /- — 0-79788. The test has also 

\ 71 

been adapted to the case when the parent mean is not zero, and tables provided for the 
application of the test (Geary and Pearson, 1938). 

Geary’s ratio is directed towards detecting deviations from mesokurtosis in the parent. 
The criterion based on kjk\, which is a natural extension of that for skewness based on 
k t /kj, is not very suitable for the purpose, since it has a skew distribution for quite high 
values of n. The distribution of Geary’s ratio tends to normality fairly rapidly 
(cf. Exercise 21.2). 


Testa of Goodness of Fit 

21.20. In Chapter 12 we considered in some detail the use of x 2 hi testing corre¬ 
spondence between observation and hypothesis. If the hypothesis specifies the theoretical 
values completely no question of estimation arises, and each cell contributing to x* could, 
if so desired, be tested separately. From this point of view x 2 compounds into a single 
test a number of tests of the kind already considered. 

If the hypothesis does not specify the theoretical values completely, but leaves them 
to be estimated in part from the data, some modification in the £*-test is necessary. We 
can now establish a result which in 12.13 was announced without proof: if the estimators 
employed are maximum likelihood estimators, then for large samples the jj*-test of signifi¬ 
cance retains its validity, provided that the number of degrees of freedom is reduced by 
unity for every parameter estimated. 

Suppose the hypothesis leaves unspecified a parameter 0, and let t be its m axim um 
likelihood estimator. Then if the theoretical frequencies based on the true value of 0 
are A and those based on t are A', we may write 

.(21.17) 

A 


A')* 

A' * 


. (21.18) 
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X * is distributed as the sum of squares of v normal variates with unit variance. The problem 
is to find the distribution of %'*• We have 





and for large samples the difference between A and A' will be of order 
expanding the difference in terms of 60, to order n~ 1 , 

1 1 1 9A' [ 2 /dry 1 3*A'l (60)* 

, A A' A' 2 do + \A' 3 V do) A' 2 30 2 J 2 


+ 


We then have, 


(21.19) 


Now for large samples the maximisation of the likelihood is equivalent to minimising x 2 > 


and hence 



and 

\A 2 00) 



(*>•*/ 

* * 2 \ A' \ 90 / 00*} 



=<^(sn- • • 

. (21.20) 


But the sum on the right is the reciprocal of the variance of the maximum likelihood esti¬ 
mator, and writing 6t for 60, as is legitimate for large samples, we have 

X *- X '*=W)1 .( 21 . 21 ) 

var t 


The quantity on the right is itself the square of a variate which (in the limit) is normal 
and has unit variance. Furthermore, its distribution is independent of that of x 2 - For 
consider the spherically symmetric density-distribution of the v normal variables whose 
sum of squares composes % 2 - Let 0 be the origin and P any point; then x 2 — OP*. Now 
for large samples the variation takes place in the neighbourhood of O. A surface of con¬ 
stant t through P is approximately plane in the effective range of variation. If OQ is the 
normal to this surface, 

OP 2 = OQ 2 + PQ 2 , 


corresponding to 


= 


(*)* 
var t 


+ X' 2 > 


for t is chosen so as to minimise x ’ 2 — PQ 2 - Thus if we take t as a new co-ordinate, together 
with (v — 1) others in the surface of constant t, the axis of t is orthogonal to the space of 
constant t, and t will be independent of x' 2 - 

It follows further that x ' 2 i s distributed as the sum of (v — 1) squares of normal 
variates. Thus the usual Type III distribution of x 2 holds for v — 1 degrees of freedom ; 
and so for every constant fitted, with a reduction of unity in the number of degrees for 
each constant. We have already exemplified the use of the result in Example 12.4 (Vol. I, 
p. 301). 


The a)*-distribution 

21.21. For small samples the x 2 -test is difficult to apply, since it depends for its 
validity on the fact that the binomial distribution in individual cells may be represented 
by the normal distribution, and hen,ce requires that cell-frequencies shall not be small. 
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A test of a different kind has been proposed by Cramer (1928) and independently by von 
Mises (1931). 

Put 


w 


-r (■*'(*) - f w )* dx > 

J — QO 


. ( 21 . 22 ) 


where F (x) is the observed distribution function and F (x) the hypothetical distribution 
function. The quantity co a varies from sample to sample, its mean value being 


E (w*) = i-f" F(x){l-F(x))dx = }-A 1 , . . .(21.23) 

n J 2 n 

where is Gini’s coefficient of mean difference (cf. 2.24). For 

E (to 2 ) = I E{F -F}*dx. 

J —oo 

For any given x the expectation of (F — F) 2 is merely the variance of the proportion F 

p /j _ jp\ 

and hence is equal to ----- - --. The result (21.23) follows at once. 

The co 2 -test consists of comparing the observed with the mean value ; but it is not 
possible to express the comparison in terms of probability as the sampling distribution 
of co 2 is not known. 


21.22. The numerical evaluation of the integral (21.22) is tedious in the case of a 
continuous distribution, and Wold (1938a) has suggested a modification. If the variate 
range is divided into intervals at — oo, x l9 x 2 . . . x i . . . oo, we define 


w 2 = £ {F (x,) — F (xj) } 2 .(21.24) 

i 

If the intervals are all of width h, 

E 0 w») = VP F (*) { 1 — F (x) }dx + -R, . . . (21.25) 

nh J _<» n 

where R/ n is a remainder term. If this maybe neglected, the w 2 -test is equivalent to the 
<o 2 -test but easier to apply. If the data are ungrouped, the x/s may be taken at equidistant 
intervals. 

In the particular case when F is normal, we have 


n E (o> 2 ) = f f J 1 ■ f «"*•* du dv dx. . 

J — oc J -oo V(2^) J* V(2w) 

Putting u * a + x and v = ft + x, we find, after integration with respect to x, 
A further substitution of y — a — p and d = « + ft gives 


(21.26) 


1 


r f 


>00 
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21.23. An interesting modification of the contest has been given by Smirnoff (1936) 
-who defines 

<»* = [ (F-F)*dF .(21.28) 

The difference lies in the differential element which has the effect of rendering 
the distribution of col independent of F. It is shown that as n tends to infinity the distri¬ 
bution function of co* tends to the form 


1 yr C 2kn e-W^dz 

V(“ 3 sin z)' 


(21.29) 


but this does not look a very promising formula for application in particular cases. 

Cram6r (1928) has extended formula (21.27) to the goodness of fit of Gram-Charlier 
series and gives some examples of fitting to observed distributions. 


Difference of Two Means 

21.24. A common case occurring in practice is that of two independent samples of 
n t and n 2 members from two populations which may or may not be different. We wish 
to decide whether the evidence indicates a significant difference between the parent means. 
This situation forms a kind of border-line case between the testing of a prior value of a 
parameter and the homogeneity tests which we shall consider below. It is a test of homo¬ 
geneity in the sense that we are to discuss the question whether two populations are equal 
in certain respects ; but wo do not necessarily assume that they are identical, and in any 
case we can regard the problem as equivalent to the testing of a single parameter (the 
difference of the means) to see whether it is different from zero. 


21.25. For large samples we discussed the question in Example 9.10 (Vol. I, p. 220) 
and gave two tests. If the hypothesis is that the parent populations are identical (a true 
hypothesis of homogeneity) we may pool the samples to form a single sample and test 
whether either mean differs from the mean of the total. If, however, we wish to test the 
less general hypothesis that the parents have the same mean but not necessarily the same 
variance, we may test the difference of means by the ordinary equation expressing the 
variance of a difference in terms of the separate variances. This is not a homogeneity test 
in the strictest sense of the word, but tests of such a character may conveniently be dis¬ 
cussed in conjunction with the other type, both for small and for large samples. 


21.26. We now consider the corresponding problem when the samples are small 
and the parent populations are assumed to be normal. In the first place we take the 
case when the two populations have the same variance or 2 . 


The sample means x x and x 2 are distributed normally with variances — and — and 


means p x and ^ 2 - 


Consequently 

<7 


is distributed normally with variance 


-1-, and hence 

n x n % * 


Q*i f*%) I 

<j sj n x + n t 


. (21.30) 
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is distributed normally with unit variance about zero mean. Further, if S\ and are 
the sample sums of squares about the mean, the quantity 

.(21.31) 

is distributed as with n x + n t — 2 degrees of freedom, independently of the expression 
(21.30). It follows that 

u = (Ml ~ (*_*) / f V± n * (g* + »» - 2) 

8 \ \ n x + n» 

is distributed like “ Student’s ” t with v = n x + », — 2 degrees of freedom. This expres¬ 
sion does not contain the unknown a and hence may be used to test the difference p x — p x . 
This result is due to Fisher (1926a). 

Example 21.4 

In a class of 20 children, 10 chosen at random were given a ration of orange-juice 
each day for a certain period and the other 10 a ration of milk. Their gains in weight 
during the period were, in pounds:— 

First group : 4, 2$, 3£, 4, 1$, 1, 3|, 3, 2\, 3£ 

Second group : 1|, 3£, 2£, 3, 2£, 2, 2, 2\, 1J, 3 

The mean increase in the first group is 2*9 pounds, and in the second 2-4 pounds. Putting 
aside other explanations, one possible factor accounting for this difference is the difference 
in treatments. But we wish to know in the first place whether this is significant. We 
assume, then, that treatment exerted no differential effect and that the -samples came 
from normal populations with the same mean and variance. We find 

x x — 2-9 x t = 2-4 

E ( x x — = 9-4 E (x t — *,)* = 3-9. 

Hence, from (21.32), with p x — = 0, 

v = 10 + 10 - 2 = 18 



From Appendix Table 3 (vol. I, p. 441) we see that such a value would be exceeded in 
absolute value with probability 0-21. The difference of a half-pound between the sample 
means is not significant. 

We note incidentally that the sample variances, 0-940 and 0-390, differ considerably, 
and shall see below how the significance of the difference may be tested. At the present 
stage our conclusion as to the non-significance of the difference of means is to be regarded 
with reserve, for the data themselves suggest that we have over-simplified the problem 
in assuming equal variance in the two populations. 

21.27. Apart from the question of unequal variances, the data of the previous 
example will serve to illustrate a further point of interest. Our hypothesis is that the 
children within each group may be regarded as a sample from a population with -the same 
mean. Had we been deeding with a sample of, say, seedlings grown from the seed of a 
single plant, this hypothesis would not have been unreasonable; but children differ very 
muoh among themselves in nutritional standard, and so forth. Our hypothesis is again 
liable to over-simplify the problem. 
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When the statistician can direct the sampling himself, this kind of problem can be 
tackled with success by pairing. Suppose we seleot children in pairs of the same sex, 
each pair resembling each other as closely as possible in all the factors which might influence 
the experiment such as age, weight and nutritional standard. We allot at random one 
member to the first group and one to the second, and so for each pair. The differences 
in weights gained between members of a pair may then be regarded as samples from 
a population with zero mean, even if the pairs differ among themselves, and the set of 
differences tested in the usual way. 

Example 21.5 

Suppose that, in the previous example, the data had related to 10 pairs of children, 
thus:— 


No. of Pair. 

First Group 
wt. in lbs. 

Second Group 
wt. in lbs. 

Difference, 
First - Second. 

1 

i 4 

ii 

21 

2 

! 21 

3* 

- 1 

3 

1 31 

21 

1 

4 

! 4 

3 

1 

5 

1 4 

21 

- 1 

6 

1 1 

2 

- 1 

7 

1 31 

2 

H 

8 

i 3 

21 

i 

0 

! 2 i 

11 

l 

10 

: 31 

! 3 

j 

i 

Totals 

29 

| 24 

5 


i 


For the values in the last column we find 

x = 0*5 s 2 = 1*25 v = 9 


t 


0-5 

Vl-25 


V9 = 1-34. 


The probability of obtaining such a value or greater (absolutely) is about 0-22, and 
the observed differences are therefore not significant. This is the same conclusion that 
we reached in Example 21.3, but it would not have been surprising had the conclusions 
differed, for they relate to different questions. 


Difference of Means when Variances are Unequal 

• 21.28. When population variances are not assumed equal the £-test of difference 
of means no longer applies. We can, if we choose, apply a test based on fiducial intervals, 
namely, the Behrens test, considered in the previous chapter. We put 


d = £l ~~ £ * 
V(*?+s'*)' 


. (21.33) 


The fiducial limits of d for various significance levels have been tabulated by Sukhatme 
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<19386) and Fisher (1941a) for n x and n, greater than 5. If the observed d falls inside the 
range, we may accept the hypothesis that the population means are equal. 

. 21.29. As we have seen, an inference of this kind does not imply that we shall be 
correct in a certain proportion of the cases, and if we wish to find a test satisfying such 
a criterion a different approach is necessary. The following investigation is due to Weloh 
(19386). 

Consider the distribution of u of equation (21.32) when the means are the same but 
the variances are different, i.e. 




J.A+A5. /± , ±\1*' 

1»i +n, — 2 \m, »*// 

-(2+2)V. 

= g f + <*1 x! / J_ . 1 \ 

<«. + »,- 2 ) »■/' 

\n x n»/ 


(21.34) 


(21.3 6) 


(21.36) 


where of %\ — and hence *f is distributed as with v x = n t — 1 degrees of freedom, 
and similarly for **. X may be regarded as a single normal variate with zero mean and 
unit variance. We have then 

», — X__ 19 1 99\ 


Now put 

where, from (21.36), 


w = a x ! + 6*1, 


, -UI 

arf n t n 2 

n x + n t — 2 of _j_ of 


-L + 1 

02 Ml M, 

M X + M, - 2 Of of 


. (21.37) 


. (21.38) 


. (21.39) 


w itself is not distributed in the Type III form unless o t — o„ but we will find a distribution 
of that form which approximates to it by equating lower moments. The first two moments 
of to, being the sum of the separate parts, are 


The moments of 


(to) = av i + bv t 1 
fit (to) = 2 (a* v x + 6* v,)j ' 


(21.40) 


dF = 


(2sf)*»r(iv) 

fit~2g* 


w *-t e -w/2a 


(21.41) 
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Identifying (21.40) and (21.41) we find— 


a 2 v x + 6 2 
av x + bv % 
(av x + bv % ) % 
a 2 v x + 6 2 v td 


. (21.42) 


With these values of g and v the distribution of w/g is approximately of the Type III form 
with v degrees of freedom and will be independent of Hence, 


X' V* 
jw 

\i~9 



= mvV) 


(21.43) 


is distributed approximately as “ Student’s ” t with v degrees of freedom. In particular, 
if 0l ~ o %i a = b and we reduce to the test of 21.26. 


21.30. 

0 = g\/g\. 


In general, when a x -A a 2 the quantities g and v depend on the ratio 
We have 


( Vi 0 + v a ) 2 

v x .O 2 + v a 


(21.44) 


and mky put u = ct where c = l/y/vg, and hence 


c 



(21.45) 


Without a definite knowledge of 0 we cannot apply the £-test, but the advantage of putting 
the expressions in this form is that by considering particular values of 0 we are able to 
judge how far the test based on “ Student’s ” distribution is likely to be affected. 


Example 21,6 (from Welch, 19386) 

Consider the case n x = n t = 10. From (21.45) we have c = 1 and from (21.44) 

9 (0 + l) 2 

v = — --. 

0 2 + 1 

Suppose now we were to use the test of 21.26, based on the assumption that 0 = 1. We 
should find, to a probability level of 0-05, that | u | must exceed 2*101 to be significant. 
If we judge u significant for such values how far are we in error when 0 is not unity ? That 
is to say, what are the true probabilities that 

P {\u\ > 2*101} 

for varying values of 0, as compared with our value of 0*05 ? 

For a specified 0 the probabilities can easily be obtained from the approximate dis¬ 
tribution u\/(gv) of equation (21.43). They are shown graphically in Fig. 21.1. The full 
line (a) shows P for various values of 0 and n x = n t = 10. The full line (6) shows similarly 
the values for n x = 5, n % = 15. (The dotted line (c) we refer to below.) 
a.s. —vol. n. 


i 
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In case (a) the line _ 

does not deviate very 03 (fc; 

much from the horizontal s' 

at P = 0-05, and we may / 

conclude that the test / 

based on the assumption / 

of equal variance is not 0-2 - f 

very much in error. In j 

any case, if the curve Values I 

falls below the line P = of p j 

0*05 we are on the safe / 

side, for our true proba- • o/ - / -- 

bility is then less than ~ (a) 

0-05, and in rejecting the —-^ - ------ ' 

hypothesis at that level 005 : / — " 

we are adopting more 

stringent standards than ^ . -__ ___ 

is apparent. u o-OI 010 10 10 100 

In case (6), when the Values of Q (logarithmic scale). 

sample numbers are un- Fig. 21 . 1 . 

equal we have a different 

state of affairs. For 0 < 1 the test is very conservative, but for 0 > 1 it may err very 
seriously in the wrong direction. 


21.31. Welch concludes that for samples of equal size there is not a serious likeli¬ 
hood of error in testing the difference of means as if the parent variances were equal. For 
samples of unequal size the error may invalidate the f-test and an alternative criterion is 
proposed. Write 

x x - x a 


f g? | gt ]». 

l*i (*i - 1) *■ (*# — 1)J 


Here, it will be observed, the denominator is an estimate of f — + 


. (21.46) 


the standard 


deviation of the difference x x — x 2 . Precisely as for u we approximate to the distribution 
of this denominator by a Type III form. Corresponding to (21.39) we find 


n t /\ (2147) 

b = -A. _/(d + sTU 

n t (n t — 1)/\»i n x )> 

Corresponding to (21.45) we find c = 1, and to (21.44) 

’" (£ + i)'/(*n£=T) + is<*!- 1 ))- ■ : (21 “ > 

v is then distributed approximately in “ Student's u form with v degrees of freedom. The 
dotted line (c) in Fig. 21.1 shows the relationship between 0 and P { | v | > 2*101} for 
= 5, m 8 = 15. Clearly the error is now much smaller than when we used u for the same 
sample numbers. 
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Difference of Two Variances in Normal Samples 

21 . 32 . If we have samples of ri x and n 2 members from normal populations with 

s 2 

variances o\ and <r§, the ratio of sample variances p 2 — -J is distributed in the form (of. 

^2 

Example 10.18, vol. I, p. 249)— 

p Hi ~ 2 dp 


dF oc 


The related quantity 
is distributed in Fisher’s form 


{n x p* n 2 \ 
\af + o!) 






dF oc 


- 1) J 
e’’ 1 ® dz 


Or* 3) 


V % \* 


. (21.40) 

\ 

-. . (81.50) 
. (21.51) 


where v x = n x — 1, v 2 = n 2 — 1. The v’s may, by a convenient extension of our previous 
terminology, be called the degrees of freedom associated with z. In practice, z is generally 
used in preference to p, but tables of both are available. 

These distributions provide a test of significance of the equality of the ratio a\/cs\. 
On the hypothesis of equality they are independent of the ratio and the probability of 
an observed p or z can be obtained. As usual, if this is small we reject the hypothesis. 
We leave it to the reader to show that this type of inference can be based on the theory 
of confidence intervals or the theory of fiducial intervals in the usual way. 


Example 21.7 

In Example 21.4 we had two samples of children and found that the difference in 
means was not significant. This was on the hypothesis that the variances were identical, 
and since the two samples are equal in number the inference remains valid even if the 
variances are different, as illustrated in 21.31. We will now test directly whether the 
sample variances themselves indicate any significant difference in parent variances. 

We have 

E (x x - x x ) 2 = 9*40 Vl - 9 

E (x 2 -■ x 2 ) 2 = 3*90 v 2 — 9. 

Hence 

, . 9-40 / 3-90 n 

2 = \ !og. - 9 / - 9 - = 0-4398. 

From Appendix Tables 4 and 5 of Vol. I (pp. 442-3) we see that for v 2 = 9 the 5-per-cent 
points of z are 

v x = 8, 0*5862 

v x = 12, 0*5613 

and the 1-per-cent, points are 

v x = 8, 0*8494 

v, = 12, 0*8157. 

Thus, notwithstanding that one variance is about times the other, the probability that 
the observed z will be exceeded on random sampling from populations with the same 
variance is greater than 0*05, and the difference of sample variances is not significant. 
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There is a point here which is frequently overlooked. In carrying out the z-test we 
always take the ratio of the larger variance to the smaller, so that our probability levels 
relate, not to the chance that a given pair of variances have a larger ratio than the observed 
one, but to the chance that the bigger of the two exceeds the smaller in a certain ratio. 
A probability of 0*05 thus relates to the chance that either a\/a\ exceeds a given amount 
k, or a\/a\ falls short of a given amount 1 /1c. If we are interested only in the former 
contingency our probabilities should be halved. 


Properties of Fiaher’a Distribution 

21.33. The z-distribution plays a very important part in statistical inference based 
on small samples, and we digress at this point to give an account of its main features. 

The distribution function of z may be obtained from the incomplete JS-function, for 
z may be easily transformed into a Type I variate. There are, however, special tables 
for lower values of and v t and satisfactory approximations of various kinds for higher 
values. 

The characteristic function of z is proportional to 

f°° dz 

J (»>i e 2 * + v ,)i 

where 0 = it, and is thus 


4(t) 


W r(*)r( ? ) ' 


Thus, taking logarithms and using the expansion 

we find 


log r (1 + x) = \ log 2n + (x + J) log x — x + — 


2 \Vi v 2 ) 4 \v x v 2 ) 

Thus, for large v t and v 2 , z is distributed normally with mean 

\ and variance 1 ( — + — Y 
\^i \v ± v 2 J 


. (21.52) 


(21.53) 


21.34. Various approximations have been given for the case when v 1 and v 2 are 
not large enough to justify the assumption of normality. 

(a) (Cornish and Fisher, 1937). The method is that of 6.32 and depends on the 
expansion of the distribution in a Gram-Charlier series. From the successive derivatives 
of log r (1 + x) we can find those of <f> (t ), and hence ascertain the cumulants of z. Writing 

r x = — and r 2 == —, we find 
*i v 2 

Kl = - J (r, - r,) - * (r? - rl) 

'f. == ^ (r t + r t ) + J (rf + rl) + 4 (r? + r|) 

*. - ~ i (r\ - 4) - (r? - r|) 

«« — rf + r| + 3 (r} + r\) 

Ki - _ 3 (rf - r$) 
k. = 12 (rf + rf) 


. (21.54) 
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Hence, putting a = and 6 — r x — r„ we find for the V s of 6.32 (m = 0, 

variance = $<y)— 

l 


» = ~ Jl { * d + t da) 

l* = i(o + d i) + i (®* + 3 <5 4 ), 


and so on. After some reduction we find, for the value of 2 corresponding to a probability 
a (which in turn corresponds to a normal deviate f),— 


- (Jl - »«* + 2 > + Jl {u + *> + h V <f ' + *«>} 

II 

V 2 \ 19S 


- T ^(f + «* < »> + <«* + 7{ ‘ + 16 > + 


(f‘ + 20f» + 16f) 


+ 44 ^ t ' 18 * f ) + (»^ 5 - 284 * 3 

2880 155520<r 2 


- 1513$)J 


. (21.55) 


(b) (Fisher, extended by Cochran, 1040a). Writing n indifferently for v x and v 2i we 
have, from (21.55), to order n 2 - 


z — 


+ 2 ) + 7l{»i 

Put A = 2/er. Then 


I 1 ,lf) 


}' 




Now 

Hence, if we put 


e 


£ 

~y/h 


\- 


_jt£ 

'lKs/h 


+ O (n a ). 


z 


t 

y/(h - A) 


\d (£» + 2), 


. (21.57) 


the difference of this quantity from (21.56) is 

(£• + 11£)<5VA 

144 

4 . 3 

provided that we take A = —-- 

o 

The difference is small in virtue of the large denominator and the factor <5 2 = ——'j 

which is small if v x and v 2 are not too different. Thus we may take z as approximately 
given by (21.57). The values of A for various values of the significance level are 

Level 40% 30% 20% 10% 5% 1% 0-1% 

A 0-51 0*55 0-62 0-77 0-95 1-40 209 
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For the commoner levels of significance the form taken by (21.57) is 

20 per cent, level: — 0*45145 

r V(A — A) 

5 per cent, level: — 0*78435 

* y/(h — A) 

1 per cent, level: -vJ—— 1*2355. 

V(h ~ A) 

0*1 per cent, level: -L _ 1-9265. 

* V(h - A) 


(21.58) 

(21.59) 

(21.60) 
(21.61) 


The accuracy of the approximation for Vj = 24, v, = 60 may be judged from the following 
comparison:— 


Level 

Value of z from 

Exact Value. 

per cent. 

(21.57). 


20 

01337 

0-1338 

1 

0-3748 

0-3746 

01 

0-4966 

0-4955 

! 


(c) (Paulson, 1942). The Wilson-Hilferty approximation to % 2 of 12.7 indicates that 

( y 2 V 2 2 s? 

— J is distributed normally about mean 1 — — with variance —. The ratio ^ itself 

is the ratio of two independent quantities distributed as % 2 with v x and v, degrees of free¬ 
dom. Further, in virtue of Geary’s theorem (Vol. I, p. 253) the ratio * 8 

normally distributed in standard measure. 

We may thus regard 

, (-4)(SH-k) 

«rtMT" • ' ■ 


(21.62) 


as approximately normally distributed in standard measure. The approximation seems 
remarkably good. For instance, the following shows the exact and approximate values 
of p* for Vj = 6, v, = 12. 


Level 
per cent. 

(-*) — p 2 , from 
\V 

(21.62). 

Exact Value. 

| 20 

1-72 

1-72 

! 5 

3 00 

300 

1 

4-85 i 

. 4:82 

0-1 

i 

8-58 

8*38 
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The Problem of k Samples 

21.35. We now proceed to consider the case when we have samples from k different 
populations and wish to determine whether there is any evidence of significant differences 
between those populations. In some cases the appropriate test can be carried out by the 
^-distribution, particularly if the data are grouped. For the groups may then be regarded 
as determining the rows of a contingency table and the different samples the columns, and 
a homogeneity test applied to the table in the manner of Chapter 12. Again, we may 
compare the samples pair by pair by the foregoing methods ; but this, apart from being 
tedious, does not give us what we want, namely a test of homogeneity of the set of samples 
taken together. 

21.36. Consider in the first instance the sampling of attributes. Suppose we have 
samples from populations in which the true proportions of successes are m, the observed 
proportions being p t . . . p k and the sample numbers n x . . . n ki totalling n. 

If p is the mean proportion of successes in all samples taken together, and our hypothesis 
is that the populations have a common value, p will be an estimate of m and we have for 
the variance of pj — 


varjOy = 


5 * 

n i 


where 



approximately, 


p = 


1 

n 


Xn iP} . 


(21.03) 


It follows that 



will be distributed normally about zero mean with unit 


variance, and hence 


y2 = Z {% [Pi - p)*} 

pq 


. (21.64) 


in the Type III form with jfc — 1 degrees of freedom (not k because we have lost a degree 
by estimating p). Hence the ratio 

Qt = En i (Pi ~ P)* .(21.06) 

pq (k - 1) 

has expectation unity. The quantity Q is called the Lexis ratio, after the author who 
first discussed it in detail (Lexis, 1903).* 


* Lexis first developed the use of Q in a paper “ tJber die Theorie der Stabilitat statistischer Reihen,” 
1879, Conrad*8 Jahrbucher, 32, 60, reproduced in the reference given above. He dealt, however, only 
with the case when all the n’s were equal and had no knowledge of the sampling distribution of Q. In 
practical applications he took as each n/ the average for the group. “ Der daduroh begangenen Fehlor 
kann man beurteilen wennman n einmal rait der grttssten und einmal mit der kleinsten Grundzahl 
berechnet.” 
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Example 21.8 

From 1910 to 1919 the numbers of live male and female births in England and Wales 
were as follows:— 


Year. 

Male Births. 

Female Barths. 

Total Births. 

Proportion 

Male/Total. 

1910 

457,266 

439,696 

896,962 

0*5098 

1911 

448,933 

432,205 

881,138 

0-6095 

1912* 

445,004 

427,733 

872,737 

0*5099 

1913 

449,159 

432,731 

881,890 

0*6093 

1914 

447,184 

431,912 

879,096 

0*5087 

1915* 

415,205 

399,409 

814,614 

0*5097 

1916 

402,137 

383,383 

785,520 

0*5119 

1917 

341,361 

326,985 

668,346 

0*5108 

1918 

339,112 

323,549 

662,661 

0*5117 

1919 

356,241 

336,197 

692,438 

0*5145 

Totals 

4,101,602 

3,933,800 

8,035,402 

0*5104 


The proportion of male births showed an increase during the war years 1916-1919. 
This is a well-known effect of war, but suppose we had noticed it here for the first time. 
The natural question is : can the effect be accidental ? There is no doubt about its reality , 
for the data cover the whole population ; but if we suppose that sex at birth is distributed 
according to the laws of chance, do the differences observed suggest that in the ten years 
concerned there was a significant change in the population (as regards proportion of male 
births) ? Let us consider the homogeneity test applied to the 10 proportions. 

We have p = 0*5104, n = 8,035,402, k — 1 = v = 9 and the sum Zfy (pj — p ) 2 will 
be found to be 19*896,783. Hence 


q -J 


19*895,783 

9 x 0*5104 x 0*4896 


= 2*974 


X 2 = (k - 1) Q 2 = 79-618. 

Q is sufficiently far from unity to reject decisively the hypothesis that the data are homo¬ 
geneous. A £ 2 -test will confirm the conclusion. We infer that, whatever the reason, 
the differences in proportions of male births, slight as they are, cannot be accounted for 
on the supposition that the distribution of sex is according to chance in samples from 
a constant population. We may observe that, had we obtained the same proportions 
for a sample one-tenth the size, % 2 would have been 7*962 and we should not have inferred 
non-homogeneity. 


21.37. A similar test may be applied with k samples of variables. Let the samples be 

x iu Xi 2 > • • • x lnt with mean x x 

#ai» ®aa» • • • &2n t 99 99 x t 


x kV x k2> • • • 99 99 %k m 

The variance of the jth sample is 

1 nJ 

(**-*,)*, 
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and an estimate of the population variance may be obtained by taking the weighted mean 
of sample variances 

at = -^-r2S(x il - x f )* .(21.60) 

W — tC j l 

Here /we have reduced the divisor to n — k so as to correspond with the number of degrees 
of freedom. 


Furthermore will be distributed with variance — and hence (assuming without 
loss of generality that the parent mean is zero), 

fc 

E {rij (Xj — £) 2 } — E{E (ri/Xj) — E (nx 2 ) } 


/=■! 


= ko 2 - <7* 

= (k - 1 ) < 7 *. 


Putting then 

l Zn j (x,-x) 2 .(21.67) 

we have another estimate of a 2 . Within sampling limits s v and s n should be equal. If 
they are not, we suspect the homogeneity of the population. 


21.38. The above test is a simple form of the analysis of variance, which we shall 
study extensively in Chapters 23 and 24 ; it is therefore unnecessary for us to develop it 
further at the present stage. Essentially the test is one of simultaneous significance of 
differences between means on the assumption that variances are constant. We shall also 
discuss in Chapter 26 a generalisation of the variance ratio for testing the homogeneity 
of a set of variances . 


Example 21.9 

The following table (from the Registrar-General’s Statistical Review of England and 
Wales for 1933 , Part II) shows the numbers of males married in England in that year 
classified according to age and district. (Certain small numbers of unspecified age and 
those under 21 have been omitted.) 


District. 

21- 

25- 

Age (Years). 

30- 35- 

45- 

56- 

Totals. 

South-East . 

31,714 

43,979 

14,995 

7,985 

3,928 

3,717 

106,318 

North. 

31,507 

39,849 

13,620 

7,108 

3,362 

2,916 

98,362 

Midland . 

17,465 

21,486 

6,729 

3,340 

1,624 

1,509 

52,153 

East .... 

4,016 

6,297 

1,820 

962 

457 

386 

12,938 

South-West . 

4,323 

6,066 

■ 

2,218 

1,177 

514 

580 

14,877 

Totals 

89,025 

116,676 

39,382 

20,572 

9,885 

9,108 

284,648 


Note the changes in interval at 25- and 35- years. 
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The question we shall consider is whether age at marriage differs significantly between 
different districts. This might, for example, be an important point if we were about to 
sample the population for some quality related to age at marriage, such as the number 
of children per family. The data might be regarded as a contingency table and x* used 
as a test of independence in the usual way. Here we adopt an alternative by considering 
the mean age at marriage in the five different districts. 

Taking the centres of the intervals to be 23, 27-5, 32*5, 40, 50 and 57*5 years (the latter 
being admittedly an approximation) and making no corrections for grouping, we find:— 


District. 

Number. 

Mean 

(years). 

Sum of Squares 
of Deviations 
from Mean. 

Variance. 

South-East. 

106,318 
98,362 - 

29-681,799 

29-312,626 

7,092,490 

66*710 

North. 

6,092,375 

61*938 

Midland. 

52,153* 

29007,344 

3,105,520 

59*546 

East. 

12,938 

29-425,761 

807,911 

62*445 

South-West. 

14,877 

29-873,731 

1,025,284 

68-917 

Whole population 

284,648 

29*429,049 

18,143,921 

i 

63*741 


The total of the sum of squares about district means, E {x fl — x f )*, is the sum of the 
figures in the fourth column, namely 18,123,580. The sum of squares E n } (x t — x)* is 
found to be 20,341. We have the useful check that these two together are equal to the 
sum of squares of deviations from the population mean, 18,143,921 (a property which we 
shall often require in the analysis of variance). 

Thus 


X = 


18,123,580 
284,648 
20,341 


= 63-67 


= 6085-25. 


/ 


No test of significance is required to see that the difference in mean age at marriage between 
districts is not a chance effect. 


Tests of Random Order 

21.39. The tests described above are concerned with the values of a number of 
sample members but not with the order in which these values occur. Sometimes there 
may not be an order, as, for instance, if a number of plants are grown simultaneously or 
a number of names drawn from a hat in a single handful. More frequently there is a tem¬ 
poral order of appearance in the values, and it is clear that, on some occasions at least, 
the order may be material. To take an extreme case, suppose we are told that in a sample 
of 100 births 53 are male. We conclude that the sample is concordant with the hypothesis 
that male and female births occur at random with probability $. But if we knew in addition 
that the first 53 births were male and the next 47 female we should almost certainly reject 
the hypothesis. 

21.40. If samplingis conducted by taking members one at a time from a population 
and the process is random, then any order is as probable as any other order. The sample 
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may be considered as a section of an infinite series generated by the sampling process, and 
this series ought to behave like von Mises’ Irregular Kollektiv (7.15). It is a happy 
hunting-ground for the theorist, since there is no limit to the number of tests which can 
be invented to ascertain whether a given finite series conforms to the random scheme. We 
have considered a few such tests in connection with random sampling numbers (8.15) 
and shall discuss others in connection with time-series (Chapter 30). Here we discuss a 
few tests which are useful in detecting departures from randomness in the sampling. We 
are not now considering hypotheses as to the parent population, but since the randomness 
of the sampling is an essential element of inferences in probability it is convenient to 
consider the reliability of the sampling, together with inferences from the sample about 
the parent. 

Ranking Tests 

21.41. Suppose we have a sample of n members x x . . . x n , in that order, and are 
doubtful about its randomness. Such doubts may arise owing either to defects in the 
sampling or to possible alterations in the population while the sampling is going on. In 
the first case the process itself is at fault; in the second, circumstances are at work to make 
the sample something other than it purports to be, a random sample from a single popula¬ 
tion. Either influence may relate the magnitude of the x’b to the order in which they 
occur, and the values x t ... x n are not then a random order in the sense that any other 
order was equally probable. 

Let us then consider all the possible orders, n ! in number, of the observed values 
x t ... x n . A proportion of these, determined by a significance level of 5 per cent, or 
1 per cent., say, we will decide to reject as improbable ; and we will select as the “ improb¬ 
able ” rankings those which exhibit the systematic appearance of which we are afraid, 
and particularly the regular rise or fall from x 1 to x n in magnitude. In short, we rank the 
sample in order of magnitude, say X t . . . X n , where the X’s are a permutation of 
the first n integers, and compute a rank correlation coefficient between this order and the 
order 1 . . . n. If the coefficient is large in absolute value (“ large ” being determined 
by the significance level) we suspect the sample of being subject to systematic influences. 

Example 21.10 

Thirty persons in the income group £1000-£1600 are asked to supply returns of their 
annual income for some purpose connected with taxation. It is intended to summarise 
their replies by a given date, but when that date arrives only 20 answers have been received. 
This is a frequent event in postal inquiries, even when the return is compulsory, and it 
has to be decided whether the 20 returns may be accepted as representative of the 30. 
There are prior reasons for suspecting that persons with bigger incomes may delay more 
than the others, partly because of difficulty in completing returns and partly because of 
a natural reluctance to part with information which may tell against them.*. We there¬ 
fore wish to ascertain from the 20 returns whether there is any evidence that persons with 
smaller incomes tend to submit returns earlier than those with larger incomes. 

Suppose the 20 returns give incomes, in that order, of £ per annum : 1180, 1270, 1400, 

* This is an assumption for the purposes of the example and not intended as a statement about 
taxation returns in real life. 
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1090, 1190, 1250, 1170, 1300, 1290, 1310, 1280, 1350, 1320, 1380, 1420, 1390, 1470, 1360, 
1220, 1460. The ranking order is— 

No. of sample . 1 2 3 4 5 6 7 8 8 10 11 12 13 14 15 16 17 18 19 20 

Bank . 3 7 17 1 4 6 2 10 9 11 8 13 12 15 18 16 20 14 5 19 

Difference. - 2- 5- 14 3 1 05-2 0-1 3-1 1-1-3 0 -3 4 14 1 

The sum of squares of differences is 508 and thus the Spearman coefficient of rank 
correlation between observed and natural order 1 ...» is 


6 x 508 
7980 


0-618. 


The probability of obtaining such a value or greater (16.18) may be found from “ Student’s ” 
distribution by putting 



v = 18, 


and is found from Appendix Table 3, vol. I, to be about 0-004. The test confirms our 
suspicion that size of income is correlated with order of appearance, and if we intend to 
use the mean income of the 20 returns as an estimate of the income in the full 30 we must 
recognise that it may very well be an under-estimate. 


21.42. It will be noted in this example that we have made no assumption about 
the distribution of incomes in the sample or the population (the latter of which would 
certainly not be normal) and have used the sample values themselves without any reference 
to the question whether they were representative. This does not invalidate our inference, 
which is made within the population of samples obtained by permuting the observed values. 
(Cf. 17.44 and 17.45.) 

2 1. 43 . A 4 second test of use in random series, particularly when it is suspected that 
cyclical effects are present, may be obtained by counting the occurrences of “ peaks ” or 
“ troughs ” in the series. A member is said to be a “ peak ” if it is greater than the two 
neighbouring members, and a “ trough ” if it is less than those members. In either case 
it is a “ turning-point ”. The interval between turning-points is called a “ phase ”. 

Three consecutive observations are required to define a turning-point. If the series 
is random the probability that any given three provides a turning-point is |, for the values 
* 1 , x„ x , may occur in six orders and in only four is the greatest or least value the middle 
one. In a series of 2V terms there are 2V — 2 sets of three, and hence the expected number 
of turning-points p is 

E (p) = f (N - 2). , . . . . (21.68) 

The variance and higher moments of p are not so easy to determine. Like the ranking 
problems considered in Chapter 16 (to which the present problem is analogous), the dis¬ 
tributions resulting are rather complicated. We quote without proof the results 


( P) = 
i«» (l>) = 
/*« ( P) = 


16 N - 29 

90 ' 

16 (N + 1) 

945 

4482V* - 19762V + 2301 


. (21.69) 
. (21.70) 


4725 


. (21.71) 
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As N tends to infinity the distribution tends to normality fairly rapidly, and p may, 
for finite N, be taken as normally distributed about mean f (N — 2) with variance 
16 N_- 29 i 

" 90 • J 

21.44. A further test may be derived from the distribution of phase lengths. The 

2 

probability of a phase of length d in a series of d + 1 terms is clearly , ™--- ., for only 

\d ~t* 1) ! 

two of the possible permutations are favourable. In a series of length N there are 
N — d — 2 possible phases of length d, for d + 3 points are required to determine the 
phase. The probability of a phase d in d + 3 terms is 


w 


1) ! (d + 2)! 


1 1 

(d + 2)! (d + 3)! 


d* + 3d -f 1 
(d + 3)1 


(21.72) 


(21.74) 


and hence the number of phases of length d is 

Ar , 2 (N - d - 2) (d* + 3d + 1) 

N ’ ' (d + 3)! ' . ( * 3) 

Now the number of possible phases is 

- + *i} . (21 - 74) 

for there is one fewer phase than turning-points, f (N — 2) in number, and the whole 
series may be a phase, which accounts for the factor 2 /N ! In practice this is negligible, 
and for the probability of a phase d in a series of N we then have (21.73) divided by (21.74), 
namely 

6 (d» + 3d + 1) (N - d - 2) 

(d + 3)! (2N — 7). ( ‘ ) 

The moments of this distribution are easily obtained to a very close approximation. 
For example, 

. , 6 n , (N - d - 2) (d* + 3d + 1) 


Z*- 


(d + 3)! 


2} [ (N - 2) { (d + 3) (d + 2) (d + 1) - 3 (d + 3) (d + 2) + 5 (d + 3) - 3} 

- (d + 3) (d 4- 2) (d + 1) d + 3 (d + 3) (d + 2) (d + 1) - 8(d + 3) (d + 2) 

+ 13 (d + 3) - 9 ]/(d + 3)! 

6 „fl 3 5 3 1 


2N -7 


Z \ (N - 2 ) 


i-- 

\ d ! 


(d + 1) ! (d 4" 2) ! (d -f* 


H 3)!} 


1 4- 3 - 8 -L. 13 - 9 

(d —lj'.d! (d + 1)'. (d + 2)! (d+ 


*_ - I 

H3)!j 


Remembering the rapid convergence of S —7 to e, we may write this 
2N*~ 7 l(N - 2) (e - 1 - 3 (e - 2) + 5 (e - *) - 3 (e - *) } 


- e + 3 (e - 1) - .8 (e - 2) + 13 (e - f) - 9 (e - |) ]. 
.... 3 (N + 7 — 4ej 3 


(d) = 


2N — 7 ~ 2' 


. (21.70) 
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Similarly we find 

/*» ( d ) = ^^—{(8e - 21) N* + (4e - 17) N - (48e* - 140e + 14) } ~ 0-560. (21.77) 

i 

VJ 21.45. In -comparing observed distributions of phases with expected values the 
ordinary x 2 -test cannot be applied, because the probabilities of the events in a finite series 
are not independent. A test of significance has been derived by Wallis and Moore (1941), 
who consider a grouping into three categories, d = 1, d = 2 and d > 3. They conclude 
that x 2 calculated from these three groups can be tested in the usual Type III form 
with v = if x 2 > 6*3. For lower values %x 2 can tested in that form with v = 2. 

This test is independent of the law of distribution of the variables and is thus of general 
application. It has to be remembered, however, that generality in these matters may 
be offset by loss of sensitivity, and more searching tests may be required in certain cases. 


Example 21.11 

The following table shows the deviations from a moving nine-year average of potato 
yields in England and Wales for the years 1888-1935 (units are -/ 6 th ton):— 


Year. 

Yield. 

Year. 

Yield. 

Year. 

Yield. 

Year. 

Yield 

1888 

- 6 

1900 

-IT 

1912 

' - 15 T 

1924 


1 T 

89 

+ 2 P 

01 

-f 6 P 

13 

+ 3 P 

25 


2 P 

90 

-4T 

02 

— 3 

14 

4- 2 

26 

— 

9 T 

91 

- 3 

03 

-IT 

15 

4- 1 

27 

— 

3 

92 

- 1 

04 

+ 2P 

16 

- 2 T 

28 

+ 

9 P 

93 

+ 6P 

05 

0 T 

17 

+ 5 P 

29 

4- 

5 

94 

— 2 T 

06 

+ 1 P 

18 

1 4- 4 

30 

4- 

1 

95 

4- 7 P 

07 

— 7 T 

19 

|-4 T 

31 

— 

10 T 

96 

H- 3 

08 

4- 8 P 

20 

! - 3 P 

32 

*+ 

1 

97 

- 6 T 

09 

4- 4 

21 

; - 9 t 

33 

4- 

2 

98 

+ 2 P 

10 

4-3 T 

22 

4- 11 P 

34 

4- 

5 P 

99 

0 

1 

11 

+ 4 P 

23 

- '•} 

35 


4 


We have marked with P and T the peaks and troughs of the series. The observed 
number of turning-points ig 31 in a series of 48 terms. The expected number is, from 
(21.68), i (48 — 2) = 30-67, almost exactly the number observed. No test of significance 
is required. 

The duration of phases is :— 


2 

3 and over . 


Observed 

20 

6 

4 


Predicted (21.75) 
18-75 
8-07 
3-18 


30 3000 

Here, again, a test is hardly necessary. We find, in fact, x* = 0-826, § of which for 
v = 2 is not significant. 

We conclude that these tests provide no evidence against the randomness of the series 
and hence do not suggest any cyclical movement in the yields. 
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21.46. In the foregoing example we have treated the two values in 1923 and 1924 
as a single value since they are equal. These so-called €4 ties ” frequently occur in ranking 
work and are a great nuisance. In the present case there is only one, and any reasonable 
method of treating it will not affect the test. Where “ ties ” are numerous enough to 
make a serious difference some systematic method of treating them is desirable, particularly 
if more than two individuals are tied. They may be treated as a single observation, as 
in this case (although it would probably be better then to reduce N accordingly); or, 
preferably, they may be counted as a mean value, e.g. with a tied pair we should consider 
the first as greater than the second and then the second greater than the first, counting the 
number of turning-points or phases as one-half in each case and adding the two together. 
This, as in all similar ranking problems, makes the theoretical discussion of sampling very 
complicated, and if it is desired to make a precise use of significance tests a further possi¬ 
bility is to assume that the tied members are ranked in the order most unfavourable to 
the hypothesis under test, so as to be on the safe side. 

Conditional Tests 

21.47. When several unknown parameters are concerned, it may be difficult to find 
a sampling distribution dependent only on one of them which will form a basis for estimation 
or a test of significance. Sometimes, however, we can get rid of undesirable parameters 
by restricting the distribution in some way, and particularly by considering a distribution 
of samples which have some specified quality in common with the observed sample. Such 
distributions we shall, in Bartlett’s phrase, call conditional. Fisher expresses a similar 
idea by speaking of samples which have the same configuration. 

The most important application of this principle is in the testing of regression 
coefficients, which we shall consider in the next chapter. Here we give a simple illustration 
of the method for the Poisson distribution. 


Example 21.12 

Suppose we have two samples from populations which are known to give the Poisson 
type of distribution but may have different parameters. We wish to determine whether 
the populations could be identical. 

Suppose the frequencies of successes in the two samples are r t and r a . If A is the para¬ 
meter of the parent (assumed the same for each), the probabilities of the samples are 


-a * ri 1 

e * —r and 

rx ! 



and their joint probability is accordingly 


P{r u r* | A} 


er 2A A ri+ r * 
rjr 2 T ' 


(21.78) 


This depends on A and does not help us in answering the question. However, for the 
probability of a sample with r x + r a successes we have (since the sum of two Poisson variates 
with parameters A lf A, is distributed in the same form with parameter A x + A a ):— 


P{r 1 + r,|A} 


e~ n (2X) r ' +r * 

(r» + r.)T’ 
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and hence 

^{n, r»|A } = (ft +>,) 1 r} m 7fl . 

P {r x + J't | A} 2 ri + r > r t ! r,! 2 r r,! r*! ‘ ' ‘ ' 

where r = r x •+• r,. 

Now in accordance with Bayes’ theorem we have 

■P{n, | A} = P{r 1} r t \r x + r,} P{r x + r, | A} 

and hence 

r -' r >“2T 1 Vr,-!. (21 ' 80 > 

Consequently, if we confine our attention to samples for which the total number of successes 
is r, the probability of the observed r t and r, is independent of A and is, in fact, the corre¬ 
sponding term in the binomial (£ + $) r . The probability is clearly that of a partition of 
r into the observed r t and r„ and if it is small we suspect the hypothesis that the samples 
emanated from the same population. 

This kind of conditional inference raises the same sort of point as we noticed in 17.44. 
We decide beforehand that, whatever r turns out to be, we will make the inference in the 
population of samples which yield that value of r. 


Pitman's Tests 

21.48. In the extreme conditional case we may consider an inference in a population 
of samples the members of which are the same as those actually observed, the population 
being given by permutations or partitions of the observed values. The tests of ranking 
and periodicity given above are oases of this kind. A similar procedure has been advocated 
by Fisher in the analysis of variance and the design of experiments, and will be considered 
in due course. We now proceed to examine tests of the same nature proposed by Pitman 
(1937a, 1938). 

Suppose we have two sets of values u l ... u m and v t ... v n with means u and v 

and the mean of the two together equal to z. Given m + n objects, there are 

ways, say N, of separating them into two sets of m and n objects, of which the given set 
is one. We call | u— v\ the spread of the separation. Since 

mu + nt5 = (m + n) z, 

we have also for the spread 

(m + n) | tt - z 1 ^ (m + n) | Z («) - mz \ _ (21.81) 

n mn 

Take a probability 1 — « = M/N, where M is an integer. If R is a particular separation, 
and the number of separations with spread not less than that of jR is not greater than Jlf, 
we call R discordant. If there are M or more with a greater spread we call it concordant. 
A separation which is neither concordant nor 1 discordant is called neutral. If m = n the 
separations occur in pairs with equal spreads, and we then take M to be even. The 
discordant separations are most easily picked out as those with the largest values of 
\Lu — mz\. 

If the observed separation is arrived at by chance, the probability that it is discordant 
is M/N = 1 — a when there are no neutral separations. If such exist, the probability 
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is less than 1 — a. Similarly the probability that a separation is concordant is 1 — a, 
or more, as the case may be. 

Two samples u 1 ... u m and v L ... v n are said to be discordant, concordant or 
neutral according as the separations u and v are so. Having selected our significance 
points dependent on a, and hence having fixed M, we can find for what values of the spreads 
a pair of samples is discordant or otherwise, and hence whether our observed pair is so. 
If they are discordant we reject the hypothesis that they came frorfi the same population. 


Example 21.13 (Pitman, 1937a) 

Two samples have the following values:— 

0 , 11 , 12 , 20 

16, 19, 22, 24, 29. 

Are they significantly different ? 

There are 9 members altogether and hence = 126 separations into samples of 

five and four. We take a to be as near as possible to 0-95, corresponding to a 5-per-cent, 
level of significance, and hence M = 6. We then find the groups which have the largest 
values of the spread. We have z = 17, so that mz = 68, and using the form j E u — 68 | 
we find those groups of four from 

0, 11, 12, 16, 19, 20, 22, 24, 29, 

which give the maximum value to this quantity. They are— 


0, 

11, 

12, 

16 





| 27 u - 68 | 
29 

0, 

H, 

12, 

19 





26 

0, 

11, 

12, 

20 





25 

29, 

24, 

22, 

20 





27 

29, 

24, 

22, 

19 





26 

29, 

24, 

20, 

19 


. 

. 


24 


The group 0, 11, 12, 20 gives the fifth largest spread, and so with M. = 6 the observed 
separation is discordant. Our inference is that the samples come from different popula¬ 
tions. Only in four other cases out of 126 should we get so large a spread in samples from 
the same population. 


21.49. The extended use of the above test is barred by practical inconvenience, 
but an approximate form based on a different measure of discordance may be used. We 
now put 


m (u — z) 2 

w = '-- , 

(N — m) fit 


(21.82) 


where is the variance of the samples taken together and is thus a constant. The function 
w is hence linear in (u — z) 2 , the device of squaring, as usual, getting rid of difficulties 
associated with the use of the modulus | u — z |. N here refers to the total sample 
m + n. 

Now, for the moments of u — z we may use the results of 11.26 (vol. I, p. 284), giving 
the moments of the mean in sampling from a finite population; for z is the population 
a.s.—vol. n. K 
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mean. Replacing n in the formulae of that section by m and putting N — m + n, 
we have— 

E (u — 2 ) = 0 
„.. ... N — m 

“(jrrijs'- 

F hi - 5 \« - l N T U -^ 2 + — «» (# — »»)} + 3N (N - m - 1) (m - 1) n% 

K ' . m 3 (N — l)(N — 2)(N — 3) ."~ 

and hence for the first two moments of w we find 

1 


E(w) = 


N - 1 


E(w*)=^_- } ( 1 + 0 ), . 


where 


0 


N -l- 1 


f 


3 {N - 2) (N - 3)\ m 


_ \ r 

m (N — m) J \ 1 


-4 6 _ 1 

4 FVT /’ 


. (21.83) 
. (21.84) 

. (21.85) 


y, referring to the measure of kurtosis - t — 3. 

For fixed JV the moduluB of the second factor in (21.85) will be found to have a maximum 
2 _ 2 ) 

at - - when m — \N, and it takes this value again at 


N - 2m 
" N 


" ± J 


N - 2 
2N - i’ 


w 


giving -— = l or 5 for N = 14 and wider limits for larger iV. It will also be found 

N (N + 1) 

that for N > 6 the factor - . r -: — 6 is not greater in absolute value than 

m (N — m) 

2 WL- 2 ) ;f 


1 


< 


m 


5, 


5 N — m 

i.e. unless one sample is more than four times as big as the other. Thus for such values 
and y t not large, 0 is small, and approximately 

3 


E ^ N* -1 
Similarly, using the fact that for large m and N 

* <* - w - 1 -*-» •••<*- 1 ) (* - 

we find approximately 

E (w 8 ) = 


. ( 21 . 86 ) 


3.6 


(N - i) (N + 1) (N + 3) 

The moments given by (21.83), (21.86) and (21.87) are those of the E-distribution 

1 


dF 


B(HN- 1) 


(1 — w)i* v-2 w~* dw, 


. (21.87) 
m 

. ( 21 . 88 ) 
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which can therefore be used to approximate to the distribution of w. In point of fact the 
distribution seems to be remarkably close. 
w may also be written 


w — 


m + n 


- 0) 2 


Z (u — u ) 2 + Z (v — v) 2 + 


mn 

m + n 


(u — v) 2 


(21.89) 


which shows that w < 1. 
We also have 


w 


mn 

m + n 


(u — v) 2 


. (21.90) 


1 — w Z (u — u ) 2 + Z (v — v) 2 
and it is instructive to observe that the function on the right is the same as that of 


u* 


of (21.32) with a few changes of notation. A transformation of (21.88) to 

Wj "4“ n<\ 2 


“ Student’s ” form will in fact show that we can test 
v = m + n — 2 ; for (21.88) then becomes 

dF oc , 


J 


wv 


w 


in the ^-distribution with 


du 


1 + 


where 


m -f n — 2 ) 

V wv 
1 — iv 


\*(/n+tt-l) 


. (21.91) 


(21.92) 


21.50. A test of a similar kind may be evolved for the product-moment correlation. 
Suppose we have two samples x 1 . . . x n and y x ... y n and calculate 


cov xy 

\/(var x var y) 


for every possible pairing of the xs and y' s, n ! in number. As before, if we choose an 
a and hence a number M such that 1 — a = M/n ! we may determine those pairings for 
which r is greatest and reject the hypothesis that x and y are independent in such cases 
if they fall among the M greatest. Since the denominator of r is constant, this is equivalent 
to attributing significance to the values of \Zxy — nxy \ which exceed a given value 
determined by a. 

Taking x = y = 0, without loss of generality we find 

E (r) = 0.(21.93) 


E(r 2 ) 


-- E (Zxy ) 2 

n 2 var x var y 



. (21.94) 
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and similarly, if y„ y, are the modified measures of skewness and kurtosis for x (expressed 

k k \ 

in terms of ^-statistics, i.e. y t = y* = pi and y 1 and y^ those for y, it will be found that 


E(r») 


— 


n — 2 
n(n — 1)* 


V i 


P __3 (n - 2)J» — 3) . 

(n — 1) (n + 1) n(n + 1) (n — 1 )* 2 ‘ 


Thus to order » _1 we have 


E (r) — E (r») = 0 
1 


£?(r a ) = 


n — 1 


^ (r«) =---, 

K 1 (n - 1) (» + 1 )J 

These are the first four moments of the distribution 


(21.95) 

(21.96) 


(21.97) 


1 


dF = B«; in-i )< 1 


— 1 < X < 1 . 

Thus r may be tested in this distribution or equivalently, putting 


1 - v<r-'■> v<n - 2,1 


. (21.98) 


. (21.99) 


in “ Student's ” form with v — n — 2. 

In particular, if the numbers x and y reduce to rankings, we have the test already 
introduced in 21.41. Compare also the result given for the distribution of Spearman’s 
P in 16.18 (vol. I, p. 401). 


The Combination of Tests 

21.51. It sometimes happens that we have a number of tests of significance, all 
yielding various probabilities, which we wish to express as a single probability. Suppose, 
for instance, that we conduct an experiment five times and that some test, such as that 
of the mean, gives probabilities to the observed deviations of 0*2, 0*8, 0*01, 0-1, 0*03. In 
the ordinary way two of these values would be regarded as significant and the other three 
not. What conclusion are we to draw as to the five taken together ? 

Suppose we have k values of the probability, p x . . . p k . The distribution of any 
particular p is rectangular, i.e. 

dF = dp 0 <p < 1. 

Hence, if x = — log p the distribution of x is 

dF « e~ x dx, 0 < x < oo 
and its characteristic function is 

4> (t) = i u *-*dx 

= 1 
1 — it' 



THE COMBINATION OF TESTS 


133 


Hence if we write 


K 

A = - JTlog p Jt 


#-1 

the distribution of A has a characteristic function 

1 




and is therefore given by 
Putting 


(1' it) k ' 


dF = ~A k ~ l e- x dA. 

1 (k) 

M 2 = 2 A = — 227 log p = — 2 log Tip 


. ( 21 . 100 ) 


. ( 21 . 101 ) 
. ( 21 . 102 ) 


we see that the distribution of M 2 is 

dF oc M™- 1 exp (- .... (21.103) 

or M 2 is distributed as with v = 2k degrees of freedom. 


Example 21.14 (K. Pearson, 19336, quoting data from E. M. Elderton, 1933). 

Pairs of boys were selected in various age-groups and one member of each pair fed 
on raw, the other on pasteurised milk. The differences in gain in weight are shown in 
the following table, together with the standard errors of the differences based on large- 
sample theory. 


" ’ 

. . 

- 


. . 

. 

(1) 

(2) 

! (3) 

(4) 

(«) 

(6) 

Age-group. 
(Central value 
in years). 

Number 
of Pairs. 

i Mean Difference 
| in Weight 

: Gained, Raw less 
Pasteurised. 

Standard 
Error of 
Difference. 

Probability 
of Observed 
Difference or 
Greater, p&. 

logio Pt- 

6} 

73 

_____ _____ - 

- 0066 

0-054 

0-8888 

T-9488 

n 

76 

h 0 022 

0-053 

0-3409 

1-5326 

81 

71 

- 0-003 

0-052 

0-5239 

1-7193 

n 

77 

+ 0-011 

0055 

0-4207 

1-6240 

10J 

60 

4 0-002 ! 

j 

0-057 

! 

0-4840 

1-6849 

i 


2-5096 


The values of p k in column (5) are obtained by expressing the observed deviations in column 
(3) in terms of the standard error in column (4) and hence determining the probability 
from the normal integral. We have 

M* = - 2 Ilog c p = - 2~ 1o I 1 ?2 > 

6 logio e 


= 6-86 
v = 10. 

The probability of a value of % 2 > 6*86 for v = 10 is about 0*74, and the test as a whole 
does not support the hypothesis of a differential effect on feeding between the two kinds 
of milk. 
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Nuisance Parameters 

21.52. From the foregoing it will have been clear that in the theories of both estima¬ 
tion and significance one of the main problems is to find a distribution which is independent 
of certain unknown parameters in the parent population. Parameters of this kind, neces¬ 
sary as they are in the specification of the parent and the precise formulation of our problem, 
can be a nuisance when we are seeking to make exact statements about some other para¬ 
meter on which interest is focussed. For this reason they have been named nuisance 
parameters. It may be useful if at this point we summarise the methods available for 
getting rid of them. 

(o) First of all there is the process of “ Studentisation ”, whereby we can remove 
scale parameters from the sampling distribution by a suitable choice of statistic. (Cf. 
19.26.) 

(6) Secondly, we may restrict the inference to a sub-population which is conditioned 
by having certain values in common with the observed sample. It sometimes happens 
that the distribution in this sub-population does not contain the nuisance parameters, 
whereas a distribution in the full population would do so (21.47). 

(c) In the comparison of two samples, or even the testing of a single sample involving 
an unknown mean, that parameter may be eliminated by differencing (21.27). As regards 
the case of the single sample, it is clear that if x x . . . x n are independent and n is even, 
the values x t — x t , x, — x„ . . . x n _i — x n will also be independent and be distributed 
with zero mean (though of course there are only \n of them). 

(d) Transformations of the variate may sometimes either eliminate the nuisance 
parameter altogether or reduce its importance. The most noteworthy case is Fisher’s 
transformation of the correlation coefficient (14.18, vol. I, p. 345). The transformed 
function z — £ is distributed nearly normally with variance 1 /{n — 3), so that the difference 
of two correlations when transformed does not involve the common value of f. 
(Cf. Example 14.8.) 

(c) We may find distributions which are independent of the unknown parameters, 
and even of the population, by using the methods of ranking or considering partitions 
(21.41, 21.48). 

(/) The fiducial argument, in at least one known case, gives a test independent of 
unknown parameters, namely the Behrens test (20.13). 

It must be realised, however, that all these types of inference do not stand on equal 
footings. In particular (e) requires further examination, as we proceed to show. 

21.53. We may now review the many different tests which have been described in 
this chapter and consider more closely the type of reasoning on which they are based. 
We may group our tests broadly into two classes, those which give a direct test of a given 
value of a parent parameter and those which do not. 

The first class rests on a type of inference which we have discussed fully in connection 
with the problem of estimation. There is, in fact, only a difference in viewpoint, and little 
or none in essential ideas, between estimating a parameter by assigning a range to accept¬ 
able values (whether by confidence intervals or fiducial intervals) and ascertaining whether 
some prior value lies in that range. The significance of parameters in large samples, the 
test of the mean in normal samples by “ Student’s ” distribution, the test of a correlation 
coefficient in normal samples, and others of the same kind relating to a specified parameter 
have the same logical foundation as the theory of confidence intervals or the theory of 



NUISANCE PARAMETERS 135 

fiducial intervals, whichever is preferred. They all provide for the consideration of alternative 
values of (he parameter. 

21.54. The second group of tests are not, on the face of it, concerned with the value 
of a parameter in a parent population, and some of them take no account of possible alter¬ 
native hypotheses. Consider, for example, a test of normality or a test of randomness. 
The hypothesis is that the population is normal or the sampling is random, as the case 
may be, but this does not specify a parameter. What alternatives to normality or to 
randomness are we considering, if any ? We must have the existence of such alternatives 
in mind, however vaguely, for otherwise we should not be testing these particular 
hypotheses. But can we say what they are ? And if not, do our inferences-remain valid ? 
When working with a probability « shall we still be right in a proportion a of the cases in 
the long run ? 

21.55. The kind of argument we have used in all these cases is this : on the given 
hypothesis the observed sample and all samples providing a greater value of the statistic 
being used for the test have a small probability. Therefore we reject the hypothesis. 

We may note at once that in rejecting the hypothesis we do so in favour of another 
hypothesis for which the observations are more probable. We may not express this thought 
explicitly, but it is there. The various statistics we use for testing normality, for instance 
b u can arise with greater probability from other populations whioh are skew or have a 
marked deviation from mesokurtosis ; the fact is assumed as self-evident (as indeed it 
is) and hence, if the statistic is improbable for the normal case there will be non-normal 
cases of greater probability. We remark, nevertheless, that the actual probability a is 
calculated on the normal hypothesis and does not hold for the non-normal cases. Thus 
we can no longer assert that we are right in proportion a of the cases. We are therefore 
relying on a less definite principle of inference to the effect that we reject a hypothesis 
which gives an improbable value to observation, provided that there exists some other 
hypothesis which gives a more probable value. 

21.56. A similar argument applies to tests of randomness. It is obvious that many 
other methods of generating a series exist which give a greater probability to a systematic 
series than the random method, and in rejecting the latter we do so more or less consciously 
in favour of the former. Our intuitive feelings on the point lead us to apply one test when 
we have the possibility of systematic order in mind (the ranking test) and another when 
we are interested in oscillations (the phase test). What we are doing, in effect, is selecting 
the test of randomness which we feel to discriminate best between the hypothesis of 
randomness and the alternative possibilities. 

21.57. Although, therefore, much remains to be done in putting tests of normality, 
randomness and goodness of fit on a formal logical basis, there do not appear to be any 
serious difficulties in doing so insofar as the specification of alternative hypotheses is con¬ 
cerned. But there remains the difficulty hinted at at the beginning of 21.55. In the 
majority of cases we have a probability 1 — a that the observed statistic t 0 will be exceeded, 
and if this is small reject the hypothesis. But why exceeded ? Why reject the hypothesis 
because of the improbability of a number of events which have not happened ? 

Here also it seems that a closer inquiry into the logic of the process would be worth 
while. We have seen how it can be justified by confidence-interval or fiducial theory 
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when a parameter is under consideration. When no parameter is specified, the process 
must, in the present state of our knowledge, rest on more intuitive ideas. My own view 
is that, in a vague kind of way, we are really considering the range of values of a parameter 
without realising it. In selecting a statistic to carry out the test, we usually relate it to 
the sort of effect we are expecting to divert the real state of affairs from those of 
our hypothesis. For instance, if we suspect cyclical effects in a random series we base 
a test on oscillations in that series. The further the series deviates from randomness the 
greater will be the value of our statistic ; and consequently, if we could measure deviation 
from randomness (in the direction of cyclicality), we should have a parameter which could 
he located in a range in the manner of confidence intervals. Such a range would exclude 
the larger values of our statistic if it can be regarded in any sense as estimating the para¬ 
meter (or, more generally, as increasing with it); and hence the procedure of rejecting the 
hypothesis if the statistic is among these large values may be justified. 

21.58. It is for this reason that we began the ohapter by defining tests of significance 
in relation to a parameter-value given a priori. It seems probable that in the ultimate 
analysis no other definition will be satisfactory. The fact that in this chapter we have 
given tests of hypotheses which do not appear to specify a parameter value is, I think, 
merely a reflection of the fact that the nature of those hypotheses and the inferences about 
them are not usually understood clearly but are based on more or less intuitive ideas. It 
is probable that many of these ideas are sound and can be given explicit logical foundation ; 
but the matter awaits investigation by the statistical logician. 

21.59. There remains for consideration the type of inference used in Pitman’s tests 
(21.48 and 21.49). These are of the character of tests of randomness. Given a set of 
values, we consider all the arrangements in which they could have happened and reject 
the hypothesis if the observed arrangement is improbable. Here again, as it seems to me, 
there is a suppressed series of alternative hypotheses which would make the observed 
value more probable; and in choosing the test, such as the “ spread ” or the high value 
of a correlation, we are intuitively relating the magnitude of a statistic to the deviation 
from randomness. Pitman himself has shown, however, that when the hypothesis is 
definite and specifies the difference of two means, the tests give confidence intervals in the 
ordinary way (cf. Exercise 21.15.) 

We shall resume the general theory of tests of significance in Chapter 26. 

NOTES AND REFERENCES 

For the use of the t-distribution in non-normal oases see Geary (19366) and Bartlett 
(1935a), the latter of whom shows that, for moderate samples, departures from meso- 
kurtosis are not very serious. For approximations to t in the normal case see Hendricks 
(1936) and Hotelling and Frankel (1938). For approximations to the ^-distribution see 
Cochran (1940a), Cornish and Fisher (1937), and Paulson (1942). See also references to 
Chapter 23. 

For the further theory of the g 2 -test see Neyman and Pearson (1928, 1931a) and for 
another test of goodness of fit Neyman (1937a). The theory of 21.44 has been Btudied 
by a number of writers, notably by Andrg (1884), Kermack and McKendrick (1936, 1937), 
and Wallis and Moore (1941). 

■ The amalgamation of tests given in 21.51 was apparently first given by Fisher in an 
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early edition of Statistical Methods for Research Workers and was studied in detail by 
K. Pearson (19336) under the title of the P r test, and by E. S. Pearson (1938). 

For a test of significance of the difference of two variances in samples from a bivariate 
normal population see Hirschfeld (1937), Finney (1938), Pitman (1939c), Morgan (1939), 
and De Lury (1938); and see Exercise 21.3. 

For the tests by Pitman, see his papers of 1937a, 1938. The similar problem in the 
testing of homogeneity in the analysis of variance has also been studied—see references 
to Chapters 23 and 24. 

For the test of difference of means when variances are unequal from the point of view 
of confidence intervals see Welch (19386) and the appendix to this paper by Miss Tanbum. 


EXERCISES 


21.1. For the population represented approximately by 

" - V(W{ ' “ s' <S * ~ **> } 6 '* <tr - 

show that, if k\ is negligible, the joint probability of a sample x x . . . x n differs from that 
if k 3 is zero by a term 

—— ~ { £ (*;) - » JT (x } ) 1 exp (- | X xf) dx x . . . dx n . 

(2jr)- l /-1 i -i J 


By the transformation 


Vx = (*i ■■ x 2 ) 

V s = —;r. (*! + ** “ 2a; s ) 

yb 

y n — —^ ( x x + x 2 . . . 4- x n ) 


and the further transformation 


y x — p sin <j> n _ 3 sin <£ n „ 4 ... sin <j> x sin <f> 0 

y t = p sin <f> n _ 3 sin <f> n _ x ... sin <f> x cos <£„ 

y»= p sin <f > n _ 3 sin ... cos <f> x 

Vn -1 = P cos <t>n- 3 . 

show that the corrective term to the distribution of “ Student’s ” t is 


dt 


1‘ (r .~ V ‘p)°*p {-£('+$}?■* 


and hence obtain equation (21.11). 


(Geary, 19366.) 


21.2. By the polar transformation of the type of the previous exercise applied to 
all n variates show that if a random sample is drawn from a normal population with zero 
mean the frequency element may be written as 

—— p n-1 e-*"' dp d<f >0 sin <f> x d<f> x sin* <f> 2 d<f» t . . . sin"” 2 ^> n _ 2 d^ n _ 2 . 
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r\ x \ 

Hence if w = —■—-, where a* is the sample variance, the distribution of w is independent 
na 

/2 

of that of a. Hence show that for the distribution of w, writing a — ./ 


Pi 


>_{ r(jn + l)}« 2” 
r (n + 1) y/n 


fit — — 2 {» (1) + a*» (2) } 


2n< l > + 3n (2) + o* n< 3 > } j 

u t = — { 3» (1) + (8<z 2 + 3) tc< 2 > + 6 a 2 w< 3 > + a* »< 4 >} /!L±J?. 
n* v / n 

Hence show that for n = 50, y/pi — — 0-24 and /S* = 3-10, indicating fairly rapid tendency 

to normality. 

(Geary, 1935a). 


21.3. Show that in samples from a normal bivariate population 

1 


dF oc exp — 


| x 2 2 pxy 

2(1-p*)\o\ <r t <r. 


9 ] 


dx dy. 


the functions 


«,-5L+a,, / -?L-a 

a x ar t a x 


are distributed independently and that their correlation coefficient E may be written 

a — a 


where 


D _____ 

y/ { (a + a) 2 — 4aar 2 }’ 
<r? E (x — *) 2 

a "5T a ~E(y~yy' 


and r is the correlation between the observed x’a and y' s. Hence show that 

_ Ry/(n — 2) _ (a — a) \/(» — 2) 

~ V(l -^j ~ V{4(F- r 2 ) a«} 

is distributed as “ Student’s ” t with n — 2 degrees of freedom. Show how to test the 
ratio a from this result. 


(Pitman, 1939c. The test has the remarkable property of being independent of the 
parent correlation p.) 


21.4. If an even number n of members of a sample come from a population with 
mean ju, show how to find a sample of half the size distributed with twice the variance 
about zero mean. Hence show how to extend the result of Exercise 21.2 to the case where 
the population mean is not zero. 


21.5. If a parameter admits of a sufficient estimator, show that a test of its significance 
can be derived direct from the likelihood function. 


21.6. Derive equations (21.47) and (21.48). 
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21.7. Let l xl9 l lz . . . li in _i be (n — 1) linear functions of the observations which 
are orthogonal to one another and to x l9 and let them have zero mean and variance of. 
Similarly define Z ai . . . l lt w _ a . 

Then, in two samples of n from normal populations with equal means and variances 
a\ and a\ 9 the function 

Vn (x ! — X a) 

{S (l x1 + hj) 2 /(n — lj}* 

will be distributed as “ Student’s ” t with n — \ degrees of freedom. 

(Bartlett, 1937c, and Welch, 19386. The test does not depend on the ratio a\/o\ and 
can be extended to the case of unequal sample numbers, but only at the expense of losing 
efficiency in the sense that the degrees of freedom number one less than the lower of the sample 
numbers.) 

21.8. Given two samples of n u n 2 members from normal populations with unequal 
variances, show that by picking n 1 members at random from the n z (where n % > n t ) and 
pairing them at random with the members of the first sample, a test of significance of 
difference of means can be based on “ Student’s ” distribution independently of the vari¬ 
ance ratio in the populations. (This test, again, is exact, but sacrifices the information of 
n 2 — n x members of the second sample.) 


21.9. If z is the ratio of the sample mean to sample standard deviation in normal 
samples, and n is large enough for the distribution of the variance to be regarded as normal, 
show that 


vW^ {i r+ 2 T „ _iy } 


is distributed approximately normally with zero mean and unit variance, where 



7 

32n* ’ 

(Hendricks, 1936.) 


21.10. If x 9 y have a continuous frequency function / (x, y), their characteristic 
function is 

4 > {U , v) = 1 f exp (iux + ivy)f(x , y) dx dy. 

J — ooj —« 

Show that the distribution of x when y is given has a characteristic function 


f e~ iyv <f> ( u, v) dv 

4 (« I V) = — — — 

e~ ivv <f> ( 0 , v) dv 

(Bartlett, 19386.) 


21.11. If a set of parameters . . . Q p admit of a set of sufficient estimators, show 
that conditional inferences independent of 9i ... 0 P are possible, the conditions being 
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that the estimators are constant for the samples concerned. Conversely, if conditional 
inference is possible, the irrelevant parameters must admit a set of sufficient estimators. 

(Bartlett, 1937c.) 

21.12. In a normal sample of n values show that if 

_ Si - a, 

V(2n) 

n 

and ns' 2 = lx 2 — nx' 2 = l (x t + a;,) 8 -f £ xf, 

where x u x % are two sample values taken at random, then 



X 

is distributed in the same form as “ Student’s ” ratio z = -, when the parent mean is 
zero. Show further that 

I *C | < 1. 

(Neyman, Lectures and Conferences on Mathematical Statistics, 1938. The example shows 
that if z is “ significantly ” large, C must be small and henco the two criteria based on z and C 
lead to opposite conclusions.) 

21.13. In a 2 x 2 contingency table, show that the border relative frequencies 
are, on the hypothesis of independence, sufficient estimators for the probability of success 
of the two attributes defining the table. Hence derive the exact test of significance in 
such a table as a conditional inference. (The exact test is given in 12.16, vol. I, p. 303.) 

(Bartlett, 1937c.) 

21.14. If two samples are drawn from a bivariate normal population, and 
are their covariances, F u and F is are the variances of the pooled samples, and F M its 
covariance, show that the distribution function 

F (v i„ | F«, F lto F„) 

is independent of the parent variances and correlation. Hence that the distribution 
would provide a test of the difference of sample covariances. 

i (Bartlett, 1937c.) 

21.15. If two samples x x . . . x m and y x . . . y n are drawn from populations which 
differ only in location and the difference in means is d, show by considering the values 
typified by x + d and y how to set confidence limits to d, based on the distribution of 
w of equation (21.82). 

(Pitman, 1937a.) 

21.16. In the previous exercise show that the confidence limits for d are the same 
as those based on “ Student’s ” distribution in the case of normal populations with different 
means and identical variances (equation (21.32)). Explain why the latter test is only 
valid for normal populations, whereas the former is valid for any population. 
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The Analytical Theory of Regression 

22.1 . When considering the theory of correlation in Chapters 14 and 15 we introduced 
the concept of linear regression of one variate oh a set of “ independent ” variates. We 
shall now study this subject more fully and extend the theory to the case where the regres¬ 
sion lines are not straight. In the first instance we confine our attention to bivariate 
populations, but the majority of our results are easily generalised to the multivariate case. 

In speaking of one variate as “ dependent ” and the others as “ independent ” we 
introduce what may be a source of confusion. In general, all the variates are dependent 
in the statistical sense, each on the others, and in special cases may even be functionally 
dependent. In selecting one for separate consideration and in discussing its dependence 
on the others we are usually attempting to solve a problem in estimation : for given values 
of the other variates, what is the best estimator of the “ dependent ” variate, or its central 
value in the distribution which it has for such given values ? The idea of “ given ” values, 
that is to say values which can be selected at will, leads to our referring to them as “ inde¬ 
pendent ”, though they may be statistic ally depende nt, nn nns another. It might perhaps 
be better to use different words7 butTthe practice is so common that we make no attempt 
to improve it. Once the point has been understood no difficulty arises in practice. 

22 . 2 . If we have two variates x, y with frequency function f(x,y), then for any 
fixed value of y the mean of x, say x v , is given by 

= f xf(x,y)dxl[ f (x, y) dx. . . . (22.1) 

The expression on the right is a function of y and thus the points whose co-ordinates 
are (x y . y) have a locus which is, in general, a smooth curve. This curve is defined as the 
line of regression of x on y, and may be written 

f xf (x, 7) dx 

X ., . . . . . (22.2) 

I / {x, 7) dx 

J —00 

where X, Y are the current co-ordinates. Similarly there will be a line of regression of 
y on x given by 

f yf{X,y)dy 

Y = pr 5 - --(22.3) 

| / (X, y) dy 

J —CO 

We shall take 7 to represent the dependent variate throughout this chapter. 
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22.3. We may also consider the more general curves typified by 

f y r f(X,y)dy 

Y = J -“ --(22.4) 

I f(X,y)dy 

J —CO 

the regression now being of the rth moment of y on x. If r = 1 we have the regression 
of the first moment, or simply the regression. If r = 2 and y is measured from the mean 
we have the so-called scedaatic curve of y on x, 

f (y -9x)*f(X>y)dy 

y== J-« -, . . . . (22.5) 

f(X,y)dy 

J —00 

which shows how the variance of y varies with x. Other forms which have been studied 
are the clitic curve 

f (y-y*) 9 f(x,y)dy 

7 - ■ " 7« - .... (22.6) 

J f(X,y)dy 

and the kurtic curve 

( (y -yxYf{X,y)'dy 

Y — .(22.7) 

j J(X,y)dy 

These curves correspond to the moments of a univariate distribution, and the main 
characteristics of a bivariate form may be studied with their aid in much the same way 
as the lower moments can be used to summarise the properties of a univariate form. 


22.4. It is interesting to remark that, just as we can find the moments direct from 
the characteristic function, so also we may ascertain the regressions of moments from 
the bivariate characteristic function, even when the distribution function itself is not 
explicitly given. 

Let us write the frequency function in the form 


f(*,y) = g(x)9x(y), .( 22 - 8 ) 

where g (x) is the total frequency for any given * and g x (y) is the frequency of y for any 
given x. In the notation of the theory of probability we should write this 

/(*» y) = ff (*) 0 fa I *)• 

The characteristic function of x and y is then 

SCO SQO , 

<f> (t u I*) = I exp {»< x x + it x y}g (x) g x (y) dx dy 

J -00 J - w 

= ( e iil x g (x) <f> x (ttYdx .(22.9) 

J -00 

where &.(«»)= f e"* v g x (y) dy .... (22.10) 

J — X 


mad is the c.f. of y for a given x. 
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If the rth moment of y about the origin for a given x is fi rx , we have 

L^a Jf.-O 

and hence, from (22.9), 

[<ljj ^ 0 == * , | e ii ' x g(x)/j: rx dx . . . (22.11) 

Thus, by the Inversion Theorem, 

9 (*) Vr* = e '"‘ x [ (*f *•) ], o *i. • • (22.12) 

subject, of course, to conditions of existence. This gives us the required expression for 
fJL rx in terms of x, and the regression can be written down at once. 


22.5. Since 


we have 


ra*i . r v 1 f m i y (*<i) ; 


= » <f> (<». °) £ Kjl 




'jl 


(22:13) 


and </> (t u 0) may be written (/> (t x ) 9 being the characteristic function of g {x). We also 
have, subject to existence conditions, 

D> g = x ^ (*»)i- • • • (22.14) 

Hence, from (22.12), (22.13) and (22.14) we find 

• • • ■ • (22 - 16) 

jTo t J ' J 

provided that the interchange of summation and integration in the last step is legitimate. 
Thus we have, for the regression of the mean, 


Y ^ £*11 r {—Dyg{x)~ 

0 P~ L 9 (*) _L-or 


(22.16) 


This notable result is due to Wicksell (19346). The expansion is valid if the cumulants 
exist and if g (x) and its derivatives are continuous in the range and zero at its extremes; 
for then the interchange of summation and integration in arriving at (22.15) is legitimate. 
In particular, if g (x) is normal and in standard measure we have 


r=£^tf # (X), 
3 * 


.(22.17) 

where H t (x) is the Tchebycheff-Hermite polynomial of order j (6.20, vol. I, p. 146). 
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Example 22.1 

Vat the bivariate normal distribution about the mean we have 


Henoe 

and from (22.12) 


<f> (ft* t%) = exp {— \ (<rf t\ + 2pa i <r 2 1 1 1% + a\ t\) }. 

m — -p°i a * *i ® x p (- i°i %)> 

• 1*00 

g {x) fi' lx — pa x c r, h exp {- Jaf <f — ih x} dt x 


P<X * xe i°t * 


Henoe 


and 


°W( **) 


DO, 

/ilx= — Z 

Vi 

Y = P^X, 
<*1 


the familiar relation of linearity for the regression of the mean of the normal distribution. 
Alternatively, direct from (22.17) we have, since k jx = 0, j > 1 

1 = Koi+ 1bh 1 ( x) 

o, a i 


Y = ff! X, as befoi 


ITO 


Example 22.2 (Wicksell, 19346) 

Consider the frequency distribution of £ = J.T (**) and r) = \l 7 (y 2 ) where y are 
samples of n from the bivariate normal population 

dF oc exp — 1 s - {x 2 — 2 pxy + y *} da: dy. 

M 1 — p‘) 

The characteristic function is 

<f> oc [| | exp (Jar* 0! + \y l ®»)=*= |(1 -0i)(l -A.) 2 , 

where 6 X — it x and 0* = »<,. 

The distribution function cannot be expressed in a simple form, but we may determine 
the regressions without it. We have 


Thus, from (22.12) 

g (£) p‘r( 


rai _ /», r _ ,v rl (i - a -p*)r. 
L^_k-o \2 ) (1 — ©,)*»+" 

. ( _i)r r (1 + r- 1 )V« {!-(!- „W 

* ]-„ (1 - 0 l )* B+r 
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The integrals may be evaluated by successive application of 

j_r 1 

and we find, for the regression of rj on f, 

^u = | + P 2 (l-|) 

Pu = - ( a * m )* 

= (i a -i>nvf}. 

Thus the regressions of both mean and variance of ^ on f are linear. 


Fitting of Curvilinear Regression Lines 


22.6. From the practical point of view the case we have just considered, namely, 
the one where the distribution or characteristic function is given, is exceptional. The 
determination of regression curves has, in the majority of cases, to be carried out from 
numerically specified material, which we shall consider in the remainder of the chapter. 
We shall confine our attention to the regression of the mean. 

In general the means of arrays will not lie exactly on a smooth curve (unless of course 
we choose a curve of order equal to the number of points to be fitted, less one). Nor do 
we know a priori what is the appropriate degree of a polynomial which will approx¬ 
imately represent the regression line. Let us, however, assume that the regression can 
be represented by a polynomial of order p : 

Y =a 0 +a x X +a z X* + . . . +a it X' t . . . .(22.18) 

We will consider later how the appropriate value of p is to be determined in particular 
cases. Our problem is to determine the coefficients a from the data. As usual, we appeal 
to the principle of least squares, that is to say, we find the values of the a’ s which will 
minimise 

U = Z (y — a 0 — a l x — . . . — a ti x p ) 3 , . . . (22.19) 

the summation extending over the sample values. 

Differentiating with respect to a h we have 

Z (ri y) - a 0 Zx f - a 1 - . . . - a p 2.V+" = 0, 


and similar equations for j = 0, . . . p. Writing the moments without primes for sim¬ 
plicity and letting juj represent the jth moment of x, and p n the bivariate moment 
Z {rf y), we have 


Writing now 


00 Po + 01 Pi 

+ • 

• • +0;) Pp ~ /-'oi 

O, fly + Oi /It 


. . + 0p jMp-t-l — f *ii 

a o Pp + Pp+1 + 

• • • + a p Pip — Pp\ 

= 

p 0 

fly . . . /l p ! 

A* l 

p% ... Pp+l 



f*p +1 ’ * * H’2p j 


. ( 22 . 20 ) 


. ( 22 . 21 ) 
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and Af> for the determinant obtained by substituting the produet-moments p 01 , . . . ft pl 
for the (j + l)th column, we have, as the solution of (22.20), 

<•<-$ .< 22 - 22 > 

22.7. It might appear that this solution could break down if A^ = 0. Suoh a 




or, if 


A (p > 


«(<), 

we have for 

A m 



i 

*0 

*0 

• • • *0 

f 

Xi 

*1 

*1 

. . . x p 1 +1 

• • J 

k 

4 +1 


~2 p 

• • • x p 



j 1 

*0 

• • • *0 


D = 

1 1 

: j 


• •. *? 

-Jf 


i i 

x v 

• • • *s 

* • • l 

[ *0*1 

x\ . . 

. x‘ p D dO 0 


dO 0 dOi . . . dO p 


dO p . 


If we now permute the suffixes of the x'a in all possible ways and sum the (p + 1)! resultants 
we obtain, in virtue of the definition of a determinant. 


(p + 1)! A M = n...i D 2 dO 0 dO ± . . . dG p , 
and hence A iP) is essentially positive. 


. (22.23) 


Y 

1 

X 

... x» 

P 01 

/“« 


■ • • ftp 

Pll 

/“i 


• • • l*p+ 1 

1 i“ P i 

V-p 

/*p+1 

. . . P2p 


22.8. From (22.18) and (22.22) we see that the regression line may be written 


= 0 . . . (22.24) 


This is a formal solution of our problem. The moments p can be obtained from observation, 
and equation (22.24) then gives the regression line. 

It will be observed that in order to preserve the symmetry we have written p 0 for 
the total frequency unity. 

22.9. A somewhat different approach leads to the same solution. If we assume 
that the regression line is a parabolic curve of order p, we may find the coefficients by the 
principle of moments. This would lead us to identify the lower moments 

E (x* V) = % x j (a 0 + #i x + . . . + a p a?) 

as far as was necessary to determine the a’s. This clearly leads back to equation (22.20). 
Orthogonal Polynomials 

22.10. The use of equation (22.24) in practice is subject to one serious drawback. 
If we have a set of data and no guide, apart from inspection, to the appropriate value of 
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p, the only course is to fit curves of order 1, 2, 3,. . . and so forth, until we reach the point 
when further terms do not improve the fit. Every time we add a new term the determin- 
antal arithmetic has to be done afresh. To obviate this nuisance we shall consider the 
regression line in the form 

Y = b a P. + 6, P, + . . . + b„ P„ .(22.25) 

where the P’s are polynomials in X, P } being of degree j. We shall determine the P’s 
so that 

£ (Pj Pk) ~ 0> . . . .(22.26)^ 

the summation extending over the observed values. 

In minimising 

£ (y — b 0 P 0 — b x P x . . . b p P p ) 2 , 

we shall have equations such as 

£ (yPj) (P 0 Pj) - . . .-b'ZiPfPj)- o, 

and in virtue of the orthogonal relations (22.26), this reduces to 

£(yPj) -bjZfP 2 ,)^ 0.(22.27) 

Thus bj is determined simply by Pj; and if, having fitted a curve of order p, we wish to 
go a step farther and add a term b pi x P p +\, the coefficients b 0 . . . b p found from (22.27) 
remain unaltered. 

22.11. Furthermore, the use of these orthogonal polynomials will give us a very 
convenient method of determining step by step the goodness of fit of the regression line. 
We have 

U -27(y-fcPo-. • •- b p P p )* 

- 2 (y*) - 2ft. 2 (yPo) - . . . - 26„ r (yP n ) + b\ £ (Pjj) + . . . + 6« 2 (P~). 

But from (22.27) we may express 2 (yP } ) in terms of 2 (Pj), and we thus find 

U = 2 (y*) - b* 2 (PJ) (Pi). . . . (22.28) 

Thus the effect of any term b t P } is to reduce U by b) 2 (Pj) and we may examine the effect 
of this term on U separately. If we find that the addition of any term b p P p does not 
reduce U significantly, we may conclude that it is redundant (so far as concerns the 
representation of a regression line by a polynomial). 

22.12. We proceed then to derive expressions for the orthogonal polynomials in the 
general case. Later we shall examine the important special case when the values of x 
are equidistant (as, for instance, with grouped .data and most time-series). 

Put 

v 

P p = £ c rt Xi .(22.29) 

]~o 

In this expression there are (p + 1) unknown constants c, and hence in all the polynomials 
up to and including those of the pth order there are J (p + 1) (p + 2) constants. The 
orthogonal relations up to and including order p will then provide \p (p + 1) conditions 
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on the c’s, so that p 4- 1 constants are assignable at will. We will take one for each P and 
assign it so that the coefficient of X* in P } is unity: 

Cjj — 1. • . . . • (22.30) 

In particular e 00 = P 0 = t. The orthogonal relations are then just sufficient to determine 
the other c’s. For instance, for the set c pj , j — 0 . . . p — 1, they are 

E P p P 0 = EP p =0 
PP p Pi =0 

and so on. This system is clearly equivalent to the p equations 

EP p -0] 

ExP p _0l.(22.31) 

ia^Pp- Oj 

Oh substituting for the P’s from (22.29) we get 

c P o Po + c pi Pi + • • • + c p, p- l Pp- i + Pp = 0 
CpoPi + c rii< u 2 + * • • + c p,p-i^p + Pp+i =0 


CpO Pp- 1 "1" Pp ■+■••• -f" Cp, p — 1 P2p-2 "f" P2p—X 


The solutionmay be expressed as a determinant in the usual way. Writing d (p “ 1) in accord¬ 
ance with (22.21) and for the minor of the term in the last row and (j + l)th column 
in (22.21), we find 


c pi ~ 


4 L 

2<p-if 


. (22.32) 


This expresses the c’s in terms of the ascertainable constants fi. It follows that 


P = —-— 
i» ^(p- i) 


: p» 

Pi 

• • • Pp 

Pi 

Pi 

• • • Pp+l 

Pp—l 

Pp 

. . . P2p-1 

1 

X 

... X v 


. (22.33) 


We notice in particular that, in virtue of the diagonal symmetry of A M , we have 


c jk ~ c kJ- 

22.13. In virtue of (22.31) we have 

E(PD=E(a*P p ) 

and thus, from (22.33) on multiplying the last row and summing, 

r/p*\ _ 
z \ r pi - 2(FT)- 

y t„ p \ _ wdjf* 

2 (V P pt - ffiZTT ■ * 


(22.34) 


(22.35) 

(22.36) 


Similarly 
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Finally, from (22.27) 

*>-% . (22 ' 37 > 

Our problem is now solved. We have expressed all the unknowns in terms of 
calculable determinants. 

We may note in passing that since the regression equation must remain covariant 
under a ohange of origin, all the coefficients b except b B are seminvariant, and the origin 
can thus be chosen at will. 6, itself is the mean of the //-values. 

22.14. Explicitly for the polynomials we have (taking — 0, // a — 1)— 

P„ = l.(22.38) 


Pi - 


1 0 
1 X 
i 


= z 




10 1 
0 1 Hi 

1 X X* 
1 0 
0 1 


= X* - HiX - 1 


10 1//, 

• o 1 /t, // 4 

; l Hz p* ^ 

X 1 

j 1 0 1 

I 0 1 P 3 

! 1 /*» P* 


. (22.39) 


. (22.40) 


= * 2 - , {(/*« ~ P% - 1 )-^ 3 - (/«S - p*p» ~ P»)X 2 

P* P a 1 

4- (h* t** — Pi 4“ Pi — p\) -X 4- (p> — 2 Hi Pi 4- p-i)} • • (22.41) 

and so on. In particular, if the population is normal, 

P x =X 

Pi = X* - 1 

Pi = X* - 3X, etc., 

the polynomials in this case reducing to the Tohebycheff-Hermite functions (6.20) which 
we know to form an orthogonal system in the normal case. 


Example 22.3. TJngrouped Data 

Table 22.1 shows the relationship between the percentage loss in weight (7) and the 
temperature ( X ) in a number of samples of soil. We require to find the regression of 7 on X. 
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TABLE 22.1 

Fitting of Curvilinear Regression for Ungrouped Data 
(Data from J. R. H. Coutts, J. Agr. Set., 20, 541.) 


Percentage Loss < 

in Weight. j 

Y 

Temperature 
(degrees F.). 
X 

3-71 ' 

100 

3-81 

105 

3-86 * 

110 

393 i 

115 

3-96 

121 

4-20 

132 

4 34 

144 

4-51 

153 

4-73 

163 

5 35 

179 

5-74 

191 

614 

203 

6-51 

212 

6-98 

226 

7-44 

237 

7-76 

251 


For the sums required we find— 

n = 16, 27 (y) = 82-97, 27 ( y 2 ) = 459-4303 ; 

L (x) - 2642, 27 (x 8 ) = 474,050, 27 (x 8 ) = 91,244,582 ; 
27 (x 4 ) = 18,553,164,842, 27 (x 6 ) = 3,930,294,225,302; 
27 (x 6 ) = 858,077,068,7^,250 ; 2: (yx) = 14,736-19 ; 

27 (yx 2 ) = 2,819,909-45, 27 (yx 8 ) = 571,902,362-11. 


These can be run off fairly quickly on a machine. We have not bothered to take a different 
mean from those given, but in general a certain amount of arithmetic can be saved by 
so doing. 

Considering first of all the straightforward approach of (22.24), we have for the straight 


line of closest fit, 


Y 1 X 

82-97 16 2642 

14,730-19 2642 474,050 


= 0 , 


reducing to 

Y- 0-660 + 2-741 (i). 


. (22.42) 


We have put n/tj instead of fi t in the second and third rows of the determinant, as we are 
clearly entitled to do. 

Similarly we find for the second- and third-order parabolas— 

r - 3-661 - 0-626 (^) + 1-070 (A)’.(22.43) 

T - 7-783 - 8-640 (j* ) - 6-876 (^)‘ - . 


. (22.44) 
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Pig. 22.1 shows the straight line and cubic fitted to the data by these means. An examina¬ 
tion of the coefficients in the equations illustrates the point made above, that as successive 
terms are added to the polynomials the coefficients of all terms may alter very considerably. 



Consider now the alternative approach by the use of orthogonal polynomials, 
the use of equations (22.33) we have 


By 


Pi = 


16 2642 

1 X 
= X - 165 125. 


/ 


16 


1\ = 


P,= 


16 

2642 

1 


2642 

474,050 

X 


474,050 

91,244,582 

X* 



X* - 343-137X + 27,032-436. 


16 2642 

2642 474,050 


16 2642 

2642 474,050 

474,050 91,244,582 

1 X 


474,050 

91,244,582 

18,553,164,842 

X* 


91,244,582 

18,553,164,842 

3,930,294,225,302 


X® 


| 16 2642 

I 2642 474,050 

I 474,050 91,244,582 


474,050 

91,244,582 

18,553,164,842 


X® - 522-940X* + 87.182-434X - 4,605,047. 


The 6-coefficients are given by (22.37), the determinants in the numerator having been 
already tabulated in finding the P’s. We have 




5-1856, 


b x 


2-7409 
100 ’ 


6, 


1-0695 


0-91889 


100 * ’ 


100 ® ’ 
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these being the values already found in arriving at (22.42) to (22.44). Thus 
Y = 6-1866 + (X - 166 126) + (. X * - 343-137X + 27,032-4) 

_ (X s - 522-940X 2 + 87.182-4X - 4,606,047). . . (22.46) 

100 8 

Tf we stop at the second term we have 

Y = 6-1856 + (X - 165-126) 

- 0-660 + 2-741 (i), 

which is the same as (22.42), as of course it must be. Similarly, if we stop at the third or 
fourth terms we find equations (22.43) or (22.44). 

Now consider the fit of the regression line. We have from (22.35), 

bl X (Pi) =nb% =&„ X ( YP p ). 

The determinants in this expression have already been evaluated in finding the regression 
line. Remembering that £ (y 2 ) = 459*436 we obtain the following :— 


0 

1 

2 
3 



: AW 

• U (equation (22.28) ). 

’ bj. 

71 b * JO’-I)' 

51856 

430-247 

i 

| 29-189 

2-7409 x 10-* 

28-390 

0-799 

1 0695 x 10- 4 

0-669 

0-130 

- 0-91889 x 10- 6 

0080 

j 0-050 


In calculations of this kind it is as well to take b i to an extra place of decimals, as the value 
of U is rather sensitive to small errors of rounding up. Even so, the last figure in U is 
unreliable. 

From the values of U it is clear that the fit is greatly improved by taking a quadratic 
term, and still further improved by adding the cubic term. How far a quartic term would 
improve matters cannot be decided without ascertaining the term. We have, however, 
not proceeded beyond the third degree because to do so would require moments of the 
eighth order. For a small population such as this, which in practical applications would 
be considered as a sample only, the errors in higher moments would probably be considerable. 

The reader who works through the arithmetic of this example will find that there is 
about the same labour involved in either method. It is in the fitting of higher order terms 
that the method of orthogonal polynomials shows its superiority. In practical cases it 
is preferable to avoid the large numbers arising from the evaluation of determinants by 
a modification of the procedure given in 22.27 below. 


Example 22.4 . Grouped Data 

In Example 14.1 (vol. I, p. 331) we considered the correlation between age and highest 
audible pitch in 3379 subjects and found the linear regressions. Let us take the work 
a stage further. 
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For the data of the table ( X — age, Y = pitch) we find— 

Z (y) = - 708 ; E ( y *) = 8894 ; 2" ( yx ) = - 12,635 ; 

E(x) = 2604 ; E ( x *) = 47,392 ; E (a: 8 ) = 387,498 ; 

E (x t ) = 4,842,172 ; E (**) = 62,401,794 ; E (a*) = 883,576,012. 

As a variation on the procedure of the previous example, we will convert these figures 
to moments about the mean (with Sheppard’s corrections) and put them in standard measure. 
We find— 


fioi = - 0-209,529 ; = 2-504,904 ; 

fii = 0-770,642 ; //, = 13-348,229. 

In standard measure the other moments are 

fi, = 1-705,375 ; //« = 6-295,759 ; 

(H = 20-729,861 ; = 78-409,775. 

We may now use equations (22.38), etc., direct, and find 

P, = 1, P t = X, P 2 = X* - 1-705X - 1, P, = X 3 - 3-471X 2 - 0-376N + 3-560. 

We now require the moments // 2 , and p tl . We find 

E (yx 2 ) = - 112,495 
E (yx 3 ) = - 1,399,639, 

and hence, with Sheppard’s corrections and in standard measure, 

ii tl = - 1-177,920 /i sl = - 4-215,958. 

We now find, from (22.37), 

b„ = 0 

6, - - 0-613,626 
b t = - 0-055,064 
fe 3 = 0-010,205. 

The regression line of the third degree is then 

Y = - 0-6136JC - 0-0551 (X 3 - 1-705X - 1) + 0-0102 (X 3 - 3-471N 8 - 0-376X + 3-560), 
where the origin is at the mean and the units are in standard measure. 

Standard Errors of Regression Coefficients 

22.15. The standard errors of unknowns derived from least squares can be found 
by the use of a result due originally to Gauss. Suppose a.j is the true value of and the 
residuals y — Exp* are distributed normally with variance v. Writing da } = oq — a }> 
we have for the frequency function of the residuals— 

oc exp — (y — Ecijxly + EE (da } 
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(E denoting summation over the sample and E over the values a 0 to a p , and the cross* 
\« i 

term vanishing because the a’s are minimal values); 

oc constant x exp — i E E ( da } x*) 8 


2v « i 

oc exp — ~ E E ( da } da k x i+k ) 

s j,k 

cc exp -^-E (daj da k p j+k ). 
4Vj,k 


(22.46) 


In the limit, then, the deviations are distributed in the bivariate normal form, and from 
the results of 15.12 (vol. I, p. 376) it follows that 


var a } 


= v 


(22.47) 


A {p) n 

for the determinant whose terms are p j+k is in fact the determinant we have already defined 
as A tp) , and Affi is the minor of the item in the jth row and column. 

Now v is the variance of deviations from the theoretical regression line, and in terms 
of variations about the observed line we have, remembering the result of 18.17— 

var a, =11.- T ”? ,.(22.48) 

1 A w n —p — l 

Since the correlation ratio of y on x is given by 

var e = var y (1 — r]*), 

we have also 


var 


a = A$ (1 - n *) vary 
* j(p) n —p — 1 


(22.49) 


For large samples the replacement of n by n — p — 1 in the denominator is an unnecessary 
refinement. 


22.16. For the case of orthogonal polynomials the results apply with a slight but 
important simplification. The coefficient b t is the same as ay if polynomials up to order j 
only are fitted, and hence, since A$ = A u ~ 1] we have 


var& _^ tf_1) (l — »?*) var y 
var °i ~ n -T- 1 • 


(22.50) 


(22.51) 


The same result follows by modifying (22.46), which for orthogonal polynomials becomes 

f oc exp — i-27 [EF] (d6y) a \, .... 

2v j \» j 

showing that the b’s are independently and normally distributed with variance 

, v 
var b) 

reducing to (22.50) in virtue of (22.35). 


EF1’ 


22.17. If the parent population is normal, rj = p, and the determinants A (i) can be 
evaluated explicitly in terms of the variance of x. In fact, 

AU-1) 1 

~W ~ j ! (var x)*’ ' ' * * (22 ‘ 62) 
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tod hence 


or, in standard measure, 


var bj 


_^ (1 — p 2 ) var y 

n — j — 1 j ! (var x)* 


var bj 



. (22.53) 
. (22.54) 


Equation (22.52) can be found by evaluating the determinants in the ordinary way, but 

it follows more simply from the consideration that is equal t<o ~ E P|, which, in the 

normal case, is for large samples equal to E (P|) =j ! (var x) 1 (6.22. vol. I, p. 147, with 
a change of scale). 


22.18. The advantages of using orthogonal polynomials instead of powers of X 
are apparent in the forms taken by the standard errors of the coefficients a and 6. The 
latter are independent of the order of the polynomial fitted and can be tested once and for 
all. The former do not possess this advantage. It seems preferable, therefore, as a matter 
of technique, to work with orthogonal polynomials throughout, whenever regressions of 
order higher than the first are likely to require investigation. 


Example 22.5 

Consider again the data of Example 22.4 (regression of highest audible pitch on age). 
We have there expressed the regression line in standard measure and in the orthogonal 
form, and may therefore use equation (22.50) in the form 

Var b 1 = --rrrr 

n A {1) 


var b 2 = 


1 


n 


var & 3 == 


n 


r? AM 
2l<*> 

AW 


(The sample number n is so large that we can ignore the element — (j + 1) in the divisor.) 
The determinants required are already known, having been ascertained in the course of 
the work. We have 


AM _ 
A {1) ’ 


J( i) 
Jf2) 


= 0*4189, 


AM 

AM 


0*0985. 


We also require rj, which was found in Example 14.11 (vol. I, p. 352) to be rj ux = 0*6231. 
Thus 1 - rj* = 0*6117. We find 


var b x 


1*8104 
10 * ’ 


var b 2 


0*7584 
10 4 * 


var 6 3 = 


0*1783 
10 4 “ 


The values of the 6’s and their standard errors are then 


Order, 

b. 

Standard Error. 

1 

- 0*6136 

0*013 

2 

- 0*0551 

0*0087 

3 

0*0102 

0 0042 
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In all oases we should judge the coefficients significant, as being more than twice the standard 
error. Although, therefore, the second- and third-order terms are small and the regression 
is approximately linear, the deviation from linearity is not merely a chance effect. 


Exact Significance Tests in the Normal Case 

22.19. When the parent population is normal, more exact tests than those derived 
from the use of standard errors may be obtained. We have already seen (14.21, vol. I, 
p. 348) that a function dependent only on sample values and the first regression coefficient 
b t was distributed in “ Student’s ” form. We proceed to generalise this result. 

Consider in the first place the linear regression equation 


Y = y + bAX-x), .(22.55) 

and let p t be the population value of b x and o\ the variance of y in the population. Since 
the parent is normal, the variance of y for any fixed value of x is a\. 

Our estimate of 6, is 

.<“*> 

where summation takes place over the sample values. Thus for fixed values of x we have— 


var b t 


E (x — x)* var y 

jE^^Wy 


_ 

E (x — *)®' 


. (22.57) 


Thus, since the mean of the distribution of b x is (f t , we see that, for samples having the 
same x’a as those observed, b t is normally distributed about mean / 3 t with variance given 
by (22.57)—normally because it is a linear function of the y’a which are themselves normal. 
Consequently, 


(6. - p x ) VE(x -x)* 


. (22.58) 


is distributed normally about zero mean with unit variance. 

If <r, were known this would provide a test of significance of b t in the ordinary way ; 
but in faot <x, is not known and the substitution of an estimate distributed in the Type III 
form brings in the t-distribution in the usual way. We take as our estimator of <r, the 
function s, where 

a* = -!-E(y-Yy, .... (22.59) 

n — z 


amd Y' represents the values “ predicted ” by the regression line, that is, the values 

r = y - b x (x - x). . . . . . (22.60) 

Thus 8* is based on the sum of squares of residuals. We shall show presently that 8* is 
distributed in the Type III form with n — 2 degrees of freedom independently of 6i — fa. 
It follows that 

,_(&»- Pi) VZ(* ~ *)* V(n - 2 ) 

VE(y-YT 

is distributed as “ Student’s ” t with v = n — 2. 

A given value p t may be tested accordingly. But we notice that the inference is a 
conditional one, that is to say, we are considering the distribution of t in a sub-population 
for which the x’s are the same as those actually observed. (Cf. 21.47.) 
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22.20. To establish the foregoing result we have to show that Z (y — F')*, the sum 
of squares of residuals about the observed regression line, is distributed in the Type III 
form with n — 2 degrees of freedom. This is a particular case of a general theorem we 
shall prove at the beginning of the next chapter, but we will sketch an ad hoc proof here 
for the sake of completeness. 

Since the population is normal, the deviations of y from the true regression line for 
fixed x’s, F = p 0 + 0! (X — x), where /?„ is the parent mean of y, is normal with variance 
of. Now 


-2)-5 = 


-Z(y-Yr=~i:{y-b 0 -b 1 (x 

Oa U„ 


X) }* 


= \ z {y — Pt — Pi — x) — (b„ — po) 

Oo 


(bi -Pi)(x -x)}*. 
The coefficients 6 0 and b x were chosen so as to minimise this sum, and hence 


(n-2)%=±Z{y-Po-p, (x-x)} 2 


n 


(b„ - /? 0 ) a 


( bl / l) *27(x-x) 2 . 


(x - x) 2 . (22.61) 


The first term is the sum of squares of n normal variates with zero mean and unit variance ; 
the second is also such a variate, for it is the square of the deviation of the mean of y about 
its true value divided by the variance a\/n ; and the third term is also such a variate, as 
shown above. 


It does not follow immediately that 


(n - 2) 8 2 


a\ 


is distributed as the sum of squares of 


n ~ 2 normal variates in standard measure, for the constituent items might be correlated. 
Let us then find an orthogonal transformation to new variates f x . . . tj n linearly related 
to the n normal variates y — /S 0 — (x — x). These also will be normally and inde¬ 
pendently distributed. In particular (remembering that our summations refer to the 
y* s and a’s, but the latter are constant for our distributions), take 

s t = -l Z{y-p t -PA*-m 


a 2 V n 

= (6o - p 0 ) 

O 2 


=- 1 ^ r 

0*2 L 


=V {61 

<y a 


pi) y/Z (x - x) 2 . 


-*) )] 


f! and f a are then normal variates in standard measure. Moreover they are orthogonal since 


27^, =-A --Z 


X — X 


a\ y/n y/E (x 
= k E (x — x) 

= 0. 


*) 2 


w 

Consequently our transformation exhibits the first term on the right in (22.61) as y if and 


y=i 


Tlr 

the second and third as |f and If. Thus the total is distributed as y if, which is the 
result required. 


i-3 
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We may compare the result of 18.17—in which we saw that the mean value of e* 
was », whereas that of e* was » — p — 1, one degree of freedom having been lost in the 
sum of squares of residuals for every constant estimated—and the approximate result of 
21.20 in which % % had to lose a degree for each constant fitted by maximum likelihood. 
Fundamentally all these results are different aspects of the same thing and rest on the fact 
that the variation of the sum of squares of normal variates in standard measure is spherically 
symmetric, so that a hyperplane in the sample space “ cuts ” the distribution in a spheri¬ 
cally symmetric form of one lower degree of freedom. 


Extension to Curvilinear Regression 

22.21. The foregoing result can be extended without difficulty to the case when 
the regression is curvilinear. If 

Y = b t P 0 + b 1 Pi . . . + b p P p , 
where the P’s are orthogonal, then 

' ZP] ’ 

and we have also, for the variance of b } when the *’s are fixed, 

7. <Xo 

so that 

fo-ft) vzPj 

02 

is distributed normally with zero mean and unit variance. Taking as our estimate of a\ 

1 Z(y-T)\ 


s 2 = 


we see, as before, that 


» — j — 1 
(bj -h)V(n-j- 1) VZPf 

vzw-ry- - 


. ( 22 . 62 ) 


is distributed as “ Student’s ” t with v = » — j — 1 degrees of freedom. 

It will be observed that in this and the previous section we have not assumed anything 
about the distribution in z-arrays. We have merely supposed that for any given x, y is 
normally distributed with constant variance. 


Example 22.6 

Consider again the soil data of Example 22.3. We found, for the cubic term in the 
parabola, a coefficient of — 0*9189 x 10 _# . Is this significant ? 

Here b } — = — 0*9189 x 10 -e for j = 3 ; 

V(» - j - 1) = V(16 - 4) = 3*464. 

We have already found £ (y — Y') 2 = U, namely 

U = 0*050. 

We further require 2 P$ which has been obtained incidentally in the working of Example 
22.3 and is equal to 9*31525 x 10 1# . Hence 

_ 0*9189 X 10-« (3*464) 3*052 X 10» 

0*2236 

= 4*3. 

This is highly significant. 
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Case when the Independent Variate proceeds by Equal Steps 

22.22. An important special case arises when the independent variate has values 
which are equidistant, as, for instance, in most time-series and in grouped data. If we take 
the interval between successive values of x as our unit, the variate-values may, by a suit¬ 
able choice of origin, be taken as 0, 1, 2 , n — 1. The various moment-functions 

entering into the expressions for polynomials, etc., may be written down once for 
all. Furthermore, this case lends itself to simpler summatory methods of forming the 
actual polynomial values and the residuals. 


22.23. For a set of values 0, 1 , 2, . . . n — 1 , we have 
z (x) = n(n ~ 1) , z (**) = n S n _- 


Z{x*) etc. 

4 


ra 2 - 1 


Pi = \ (n — 1). Pt = 12 ■■■» P* = 0, etc. 


From (22.38) and similar equations we then find 

Pi=x - 


(22.63) 


p X* — X/x„ — nl p 2 » 2 — 1 [ 

Pt -” 1 12 J 

and so on. The polynomials may be obtained more systematically as follows :— 

We show first of all that 

where A j is the jth terminal difference of P p and the x’s range from 0 to n — 1. In fact, 
from Newton’s interpolation formula, 

T* V^ [il A A 


- Lit • 


and since the P’s are orthogonal, 

L (x + q — l) ta-11 P p = 0, q < p. 

X 

Substituting from (22.66), we find for the term in A 1 P p — 

Z(x+q- ~P p = X{(x+ q)*>+» -(x+q- P p 


. ( 22 . 66 ) 


( 22 . 66 ) 


Thus for all q from 1 to p we have 


' *U + q)jl P 


+ ,, * + "(JTOT P ’ 

(n — 1) 1 \ j ) j + q p ’ 

whence follows (22.64). We now find functions obeying these conditions. 
Consider 


g —l)!„/» —1\ A* D . 

r> L H j )rr , p - * <p 


y = C (x + p) [p \ 


. (22.67) 
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This is a polynomial of degree p, and if for a; = 0,1, . . . p it assumes the values y t , . . . y p 
we have— 


y(x)=C xh’+H f'' yd—l hl— 


( 22 . 68 ) 


for this also is of degree p and has the right values at x — 0 , . . . p. Taking now 


Vt = (»- l) 1 - (P -J ) 1 /_ w-i# P 

Vi (» — j — 1)! 1 ’ 


. (22.69) 


we find that for x = — q 


= C i-ir P r . 


. (22.70) 


Now from the definition of y this clearly vanishes for —# = <7 = 1 , . . . p, and 1 thus 
(22.70) is zero. Comparing it with (22.64) we see that the conditions are satisfied if we 
give to y i the value of A i of (22.69), i.e. 

■ ■ ■ < 22 - ,1) 

The constant C is evaluated by the fact that the coefficient of X v in P p is unity, giving 
A p P p — p ! This gives 

(2p )! (» — j» — 1)! 1 ; 

Finally, substituting in (22.65), we find 

-7,- d ,*<*-»■■•<*-' + «• < 22 ”> 

where by convention the term X m is unity for j = 0 . The first six polynomials are 

P. = x-^ 


p* = p\ 


p _ ps _ 7 p 

F *~ Fl ~W~ Fl 

P - P4 _ 3W * - 13 P* 4. 3 (»* - l )>* -J) 

4 1 - 14 ~ 1 + 560 

p _ P i _ 6 (to* - 7) p » 15n* - 230ro* + 407 p 

* 1 18 1 + 1008 ’ 1 

p _ *>, 6 (3 to* — 31) P4 5» 4 — 110»* + 329 Di 

P % -P x — P 1+ - — - P t 

5 (to* - 1) (to* - 9) (to* - 25) 

14784 


(22.74) 
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Four more values are given by Allan (1930), to whom the above derivation of (22.73) 
is due. 

Values of the polynomials up to and including the fifth are given in Fisher and Yates’ 
Statistical Tables up to n = 52. 


22.24. We can now find an explicit expression for 27 P*. Since the polynomials 
are orthogonal we have 

zpi=z(x+ v r'p lt 

which, by the argument resulting in (22.64), leads to 


vp* _ f 1 ( n + P) ! & p 

P ^ i \ (n - / - l)| p +j + 1 


o 


Putting q = p + 1 in (22.67) and (22.70), we have 


V ( — q) — C ( — 1 )M = (- 1)" (2 p + 1)Ip+u 
whence, after a little rearrangement, 


t.C 7 % 


_ p 

+ j + 1 P ’ 


2 (n + p)\ 
j ! (n -j - 


A* P p _ ( y !) 2 ( n + p )! c% 

l)!p+i + l (2p + !)!(»-!)! ’ 


and thus, substituting for C from (22.72), we find 


2’P* = 


(P n ( w 2 1 ) 

(2 V) f - ( 2 P + 1 )! ( } ' ■ ‘ 


(n 2 - p 2 ). 


. (22.75) 


22.25. 

differences. 


It is also possible to express the orthogonal polynomials in terms of central 
We quote without proof the results (for details of which see Allan, 1930) :— 


where 


Pp = 


pi 

(p - h )! 


I\ X 


(- 1 np - j -j)l 

(p - 2 j )! j ! 2’ iJ 


[P t y- 2/-i 


W" = 


{x + |(w - 1)}! 

(a: — £(» - i)}! ' 


(22.76) 

(22.77) 


The series is summed from j — 0 until 2j > p, when the denominator vanishes and (p — J)! 
is written for F (p + £) to preserve the factorial notation. In practice the polynomials 
for particular examples are not determined from (22.73) or (22.76) but by the use of tables, 
or by summation from differences in the manner of Example 22.9 below. 


Example 22.7 

For the fitting of a regression line in the case of equidistant intervals various methods 
are in use. A choice between them depends on the length of the series, the order of regres¬ 
sion to which it is desired to go, and the computing resources at the investigator’s disposal. 
We will illustrate two methods in this and the next example. 

M 


A.S.—VOL. H. 
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TABLE 22.2 

Fitting of Regression Line by Orthogonal Polynomials—Equidistant x-intervals. 


(1) 

Year. 

(2) 

Variate. 

Pi 

(3) 

Population 
(million) i 

Y 

. (4) 

P. 

(6) 

\P» 

V 

(6) 

1811 . . 

- 6 

1016 

22 

- 11 

99 


1821 . . 

- 5 

1200 

11 

0 

- 66 


1831 . . 

- 4 

13-90 

2 

6 

- 96 


1841 . . 

- 3 

15-91 

- 5 

8 

- 54 


1851 . . 

- 2 

17-93 

- 10 

7 

11 


1861 . . 

- 1 

20-07 

- 13 

4 

64 


, 1871 . . 

0 

22-71 

- 14 

0 

84 


1881 . . 

1 

25-97 

- 13 

- 4 

64 


1891 . . 

2 

29-00 

- 10 

- 7 

11 


1901 . . 

3 

32-53 

- 5 

- 8 

- 54 


1911 . • 

4 

36-07 

2 

- 6 

- 96 


1921 . . 

5 

37-89 

11 

0 

- 66 


1931 . . 

6 

39-95 

i 

22 

i 

11 

99 

, 



In Table 22.2, column 3 shows the population of England and Wales (in millions) 
for the years shown in column 1. These are at ten-yearly intervals, and the variate-values 
in units of 10 with origin at the mid-point of the range are given in column (2). These 
are the values of P,. 

The corresponding values of P», P s and P 4 are given in the last three columns. They 
may be calculated direct from (22.74), but are most conveniently taken direct from the 
Fisher-Yates tables. 

We find, for n = 13, 


Z YP,. = 474-77 
Z YP t = 123-19 

Z YP , = - 39-38 X 6 = - 236-28 
Z YP, = - 374-30 X = - 641-667,143, 


and, direct from the tables, 

ZP\ = 182, ZP\ = 2002, ZP\ = 672 X 36, 

ZP\ = 68,068 X (4jt)». 

S 27 YP 

Hence, from equations of the type 6* = we find 

z ^f 

bi = 2-608,620, 6, = 0-001,533,467, b 9 = - 0-011,474,359, & 4 = - 0-003,207,099 

and the quartio curve is 

y Y - 24-1608 = 2-6086X + 0-061,53 (X* - 14) - 0-011,47 (X s - 25X) 


- 0-003,208 ^X 4 - ^ X* + 144^ 


(22.78) 


We can now find the residuals for each term in this equation. We find 

Z 7* = 8839-9389 
ZY = 314-09. 
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Hence the sum of squares of Y about the mean of Y, 

E (Y — Y)* = 1261-283. 


Thus we have :— 


Residual Sum of Squares, I 


j Original variation.. 1251-283 

| Contribution of first term = b x 2 (YP X ). . . , 1238-497 

i Contribution of second term = b 2 2 (YP Z ) . . j 7-580 

j Contribution of third term = b 3 2 (YP 3 ) . 2-711 

| Contribution of fourth term = b 4 2 (YP 4 ) . ! | 2-058 

: i 


12-786 

5-206 

2-495 

0-437 


For the variance of the residual elements we divide by the number of degrees of freedom 
(n — j — 1) and obtain 


Residual Sum of Squares. 

Divisor. 

Residual Variance. 

| 

12-786 

11 

■ 1-162 

5-206 

10 

0-521 

2-495 

9 

0-277 

0-437 

8 

; 0-055 


Fig. 22.2 shows the data graphically with the cubic and quartic of closest fit. 



Fig. 22.2. — Cubic (full lino) and Quartic (broken line) Parabolas fitted to the Data of Table 22.2. 


The fit is evidently a good one, as is borne out by the smallness of the residual variance, 
but we must sound a warning as to the use of this polynomial. For interpolation in the 
variate range it would probably suit very well; but for extrapolation outside the range 
it is dangerous unless there is good reason to suppose that the polynomial has some theoretical 
basis (which is not so). It would, for instance, be most unsafe to try and estimate the 
population in 1960 by inserting X = 9 in equation (22.78). 
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Example 22.8 

In Chapter 3 it was seen that factorial moments can be derived by summatory pro¬ 
cesses. A somewhat similar method can be used to fit orthogonal polynomials. We will 
illustrate it on the data of the previous example. 

TABLE 22.3 

Fitting of Orthogonal Polynomials by Factorial Sums . 



St I 

i 

1016 

10-16 : 

1200 

22-16 

13*90 

36-06 

16-91 

61-97 

17-93 

69-90 

20-07 

89-97 i 

22-71 

112-68 

26-97 

138-66 

29-00 

i 167-66 

32-63 

1 200-18 

36-07 

i 236-26 

37-89 

] 274-14 

39-96 

314-09 

j_ 

314-09 

1 1723-86 



! 

10-16 

i 

10-16 

32-32 

i 42-48 

68-38 

110-86 

120-36 

231-21 

190-25 

421-46 

280-22 

! 701-68 

392-90 

1094-58 

531-55 

I ' 1626-13 

699-20 

2325-33 

899-38 

3224-71 

1135-63 

4360-34 

1409-77 

5770-11 

1723-86 

7493-97 

7493-97 

• 

1 


In Table 22.3 the column headed S 0 gives the value of Y. The next column, headed 
S u gives the sums of the values in the first column proceeding from the top ; and so for 
the columns headed S 2 and S z . 

Now construct the quantities 


O, = - s t = — = 24-100,709 
n 13 

2! _ 2(1723-86) 

= -,-“7-7-, S x = ,"7 7- = 18-943,516 


n (n + 1) 


182 


__JLL_ - S. - tmm _ 16 470,264 

» (n + 1) (n + 2) 2730 


the general formula being 


Then obtain the quantities 


0 _ 0L±i U3. _. 

1 »(» + !)...(» +j)' 


a' 0 = a 0 — 24-160,769 

a x == a„ — o x = 5-217,253 

o, =Oo — 3«i + 2o, = 0-270,749, 


the general formula being 


g ,- p(y,+, . 1 ) + ( g-D . (P)(P + D(P + 8 ) _ 

p 0 (1 !)* 2 1 T (2 !)* 3 * 


. (22.79) 


. (22.80) 
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Finally put 


b t =a 0 = 24-160,769 


6 l= = 


6 


n — 1 


■ a, = 


_ 6 (5-217,253) 


12 


= 2-608,626 


6,= 

the general formula being 


30 

1) (n - 2) 


30(0-270,749) 

& 1 oo * * 


132 


b = ^) • _ top /no gl\ 

P (P !) 2 (» — 1) . . . (» — p). 

Then the 6’s are the coefficients of the orthogonal polynomials in the regression equation. 
The values we have found check with those of the previous example and the reader may 
care to work out b s and b t by the same method. 

This process is due to R. A. Fisher and avoids the direct calculation of the values of 
the orthogonal polynomials. Its validity may be established by using equations (22.75) 
and (22.73), which give 

h (2p\)(2p + l)\ 

p EP% (>!) 4 n(rc 2 - 1) . . . (n 2 p 2 ) Ky p) 

__(2 p+1) ! y ( ~1 ) p ^_(P+J)_y' U+ 1 ) S V X . • ;_-J x -l± l ) 

(P !) 2 (n- 1) . . . (n -p) j (j !j 2 (p-j )! (j + l) (n-p-l )! n . . . (n+p) 

The first part of the expression explains the coefficients in (22.81), the second part those 
in (22.80). The third part gives rise to (22.79) when it is remembered that the sums 8 
are expressible as sums of factorials (cf. 3.10, vol. I, p. 58), but the summation takes place 
from the top of the column. 


Example 22.9 

As a rule it is unnecessary to evaluate the polynomial at all the points for which data 
are given ; but if the values are desired for comparison with observation they may be 
obtained by summatory processes from the differences. 

The terminal differences themselves are obtainable simply from the quantities a p of 
the previous example. For a polynomial of the first degree we have 




6 


n 


toi 


For that of the second degree, 

A 2 Y = 


Y = -j- 3ctj. 

60 


AY = 


to% 


(ai + 5a 2 ) 


For the third degree, 


A 3 Y = 
A 2 Y = 


(n — 1) (n — 2) 
_ 6 
n- 1 
Y == ~|“ 3dj “i* 5a 2 . 

- 840 


(n — 1) (n — 2) (n — 3) 
60 


ox a 3 


AY = - 


(n — 1) (n — 2) 
6 


(a 2 + 7a 3 ) 


(ai + 5a 2 + 14a 3 ) 


n — l 

Y = a Q 3flj + 5a 2 + 7a 3 . 


. (22.82) 


. (22.83) 


. (22.84) 
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The formulae for higher degrees are constructed on analogous lines, the multiplying 
factors for successive differences being given by 

/_ i\p(P_+ 1) (p + 2) . . . ( 2p + 1) 

v ’ Jn - 1) (» - 2) . . . (n-p) 
and the coefficients of the a’s by 


Y 

1 

3 

6 

7 

9 

11 

AY 


1 

5 

14 

30 

55 

A * Y 



1 

7 

27 

77 

A» Y 




1 

9 

44 

A* Y 





1 

11 

d* Y 






1 


We leave the proof of these results to the reader. 

For instance, for the data considered in the two previous examples we found, for the 
parabola of the second degree, 

Y = 24-160,8 + 2-0O8,0Z + 0-061,633 (Z* - 14) 
a 0 = 24-100,769 ; a[ = 6-217,263 ; a' 2 = 0-270,749. 

Hence, from (22.83), 

d* Y == , a 2 = 0-123,068 

{n — 1) (n — 2) 

J7=-— (a[ + 5 oj) = — 3-285,499 

n — 1 

Y = (Lq -j- 3 -f- 5^2 =s 41*166,273. 

W© then build up the polynomial values as shown in Table 22.4. The second difference 
0*123,068 is shown at the foot of column (2). Being a constant, it could have been written 

TABLE 22.4 


Calculation of Polynomial Values from Differences . 


(i) 

(2) 

(3) 

(4) 

(6) 

(6) 

Number of 

Second 

First 

Polynomial 

Observed 

Difference 

Term. 

Difference. 

Difference. 

Value. 

Value. 

(6H4) 

1 


- 1-808,68 

9-863 

10*16 

0-297 

2 


- 1-931,75 

11-795 

1200 

0*205 

3 


- 2-054,82 

13-849 

13*90 

0*051 

4 


- 2-177,88 

16-027 

15*91 

- 0*117 

5 


- 2*300,95 

18-328 

17*93 

- 0*398 

6 


- 2*424,02 

20-752 

20*07 

- 0*682 

7 


- 2-547,09 

23-299 

22*71 

- 0*589 

8 


- 2-670,16 

25-969 

25*97 

0*001 

9 


- 2-793,23 

28*763 

29-00 

0*237 

10 

i 

- 2-916,29 

31-679 

32*53 

0*851 

11 


- 3-039,36 

34*718 

36*07 

1*352 

12 


- 3-162,43 

37*881 

37*89 

0009 

13 

0-123,068 

- 3-285,499 

41*166,27 

39*95 

- 1*216 


all the way up, but, to do so is a waste of time (and in practice, of course, we should not 
devote a separate column to it). The first difference is shown at the foot of column (3), 
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and the figures above it constructed by adding the second difference at each stage. The 
polynomial values themselves are compiled by adding the first differences to the value 
at the foot of the column, 41*166,27. 

We have also shown the observed values and the difference between polynomial and 
observed values. The sum of squares of the latter is 5*204, agreeing within the margin 
of rounding-up error with the value for the sum of squares of residuals found in 
Example 22.7. 

As an exercise the reader should work out the polynomial values for the third- and 
fourth-order polynomials and compare the sum of squares of residuals with the values of 
Example 22.7. 

Multiple Curvilinear Regression 

22.26. We considered the linear regression of one variate on a number of others 
in Chapters 14 and 15. There now remains the extension of our results to the 
curvilinear case. 

The extension is very easy to carry out when we remember that in multiple linear 
regression there is no restriction on the degree of dependence among the “ independent ” 
variates. In particular, some of them may be functionally related, and more particularly 
still, one variate may be a power of another. It is thus clear that the process of fitting 
curved regression lines can be regarded as formally equivalent to that of fitting linear 
regressions. For instance, the fitting of 

Y = a 0 + a x X x + a 2 X 2 + u 9 X 9 -f- a 4 X 4 + a 5 X 6 

is equivalent to 

Y = a 0 + a x Xi + a 2 X x + a 9 Z x + a 4 

the latter being a particular case of the former where X 2 is the square of X x (and their 
covariation accordingly complete) and similar relations exist between X 3 , X 4 and X 6 . 

The case of curvilinear regression for a single variate, which has occupied the fore¬ 
going part of the chapter, could then have been treated by the methods of Chapter 15. 
We have discussed it afresh only because it is more easily dealt with by direct methods. 

22.27. In multiple regression analysis it sometimes happens that, having worked out 
a regression equation, we wish either to take account of a new factor or to remove one 
which appears redundant. To avoid the necessity of solving a new set of determinantal 
equations the following device is useful:— 

Consider the case of three independent variates measured from their mean 

Y « b x X x -f b 2 X 2 + b 3 X 9 .(22.85) 

In accordance with our general method the constants b are given bv 

b x X (a*) + 6 , £ (x x x 2 ) + 6 8 £ (*1 *.) = £ (*1 y) f 

61 £ {Xx X 2 ) + 6 a £ (xl) + 63 £ (x 2 x 9 ) =£(x 2 y)y . . (22.86) 

b x £ (x x x 3 ) + 6 j £ (x 2 x 9 ) + 63 X (x s ) = X (x 3 y) 

Suppose now we replace the functions X (xy) on the right by 1 , 0 , 0 and obtain the solutions 
b x = c n , 6 2 = c ia , 63 = c 18 ; and similarly for replacement by 0, 1 , 0 and 0, 0, 1 , 
the solutions being written 

61 = C n , C12, Cisl 

62 = c 2 2 » Cas /...... (22.87) 

63 *= C 23> C 33 J 
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Then the solution of ( 21 . 86 ) is 

b t = c„ E (*! y) + c lt E (x t y) + c„ E ( x t y) 

b% — Cjj E ( Xi y) + Cjj E ( x a y) + Cn E (#* y) > . . (22.88) 

bf — Cn E (Xi y) + c»» E (a?i y) + ^ (*s y) J 

as is immediately evident on substitution. The values of the c’s are those we have denoted 
earlier in the chapter by determinantal forms, e.g. c Jk = Afl/AW. 


22.28. Now suppose that we wish to discard the variate x». From (22.86) 
1 , 0 , 0 written on the right, we find 

Wf (13) 1 

(12) (23) 0 


witn 


c “ ~ ~ A 


where (jk) stands for E (x } x k ), and 

A = 


(13) 

(33) 

0 

( 11 ) 

( 12 ) 

(13) 

( 12 ) 

( 22 ) 

(23) 

(13) 

(23) 

(33) 


(22.89) 


(22.90) 


There are similar expressions for the other c’s. If the values of the constants when x, 
is removed are c' u , c 12 , c 22 we shall have 


c n — ~ T> 


( 12 ) 

1 / 

c' - 1 ! (11) 1 

( 22 ) 

0/\ ’ 

12 “ A’ ( 12 ) 0 


etc. 


where 

Now we have 


/A' 


( 11 ) 

( 12 ) 


( 12 ) 

( 22 ) 



(ii) 

( 12 ) 

1 


( 11 ) 

( 12 ) 

0 


( 12 ) 

( 22 ) 

0 


( 12 ) 

( 22 ) 

1 

Cia C 23 _ 

(13) 

(23) 

0 


(13) 

(23) 

0 

C 33 


( 11 ) 

( 12 ) 

0 




A 

( 12 ) 

( 22 ) 

0 





(13) 

(23) 

1 ' 




( 12 ) 

( 22 ) 

( 11 ) 

( 12 ) 

(13) 

(23) 

(13) 

(23) 

AA 



Thus 

Cu 


c ia Caa __ c ia c aa c ia c aa 
C 33 C33 


_ 1 (12)d‘ 
1 AA' 

: ^ 12 * 


AA' 


(22.91) 

(22.92) 


( 12 ) 

(23) 

( 11 ) 

( 12 ) 


( 12 ) 

( 22 ) 

( 11 ) 

( 12 ) 

(13) 

(33) 

( 12 ) 

( 22 ) 


(13) 

(23) 

(13) 

(23) 


(22.93) 
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Similarly 


c n — c n — - 


Coo — Coo 


13 

d 33 


r 2 

^23 


^33 


. (22.94) 


(22.95) 


This gives us the new c’s in terms of the old. Denoting similarly the new 6 ’s %y primes r 
we have 


b ! - K = (Cn ~ O Z (a:, y) + (c 18 - 4) 27 (x„ y) + c„ Z (x 3 y) 
= — { C ] 3 2, (a?! 2 /) + Cj S c S j 27 (as* y) + c l3 c 33 27 (a : 3 y )} 

C 33 

_ ^ 13 ^3 

C 33 

Hence we have 


b\ = it 
6 ; = b 2 




C 33 

C 23 63 
C33 • 


expressing the new constants in terms of the old and the known constants c. 
Finally, the contribution to the sum of squares due to the variate x 3 is 

6 X 27 (a; x y) + b 2 27 (ar 3 y) + b a 27 (*3 y) -b\Z (x L y) - 6 ' 27 (a ; 2 y) 


. (22.90) 


= -27 (xj y) + C - 6 ,2 1 (x 2 y) b a 27 (x 3 y) 

C 33 C 33 

= 61. 

c 33 


(22.97) 


22.29. Generally, if there are p independent variates the equations for the 6 ’s are 
b x 2 (x\) + b 2 2 (x x x 2 ) + . . . + b p 2 (x x x p ) = 2 (y x x ) 

b\ 2 (#! Xp) + b 2 2 (x 2 Xp) + . . . + bp 2 (x p ) = 2 (y x p ). 

If x p is omitted the equations become (p — 1 ) in number in variables b[ . . . b’ p _ v Sub¬ 
tracting from these the first (p — 1 ) of the above equations we find (p — 1 ) equations, 
typified by 


(b\ -6,) 2 (x 1 Xj )+(b' 2 -b 2 ) 2 (x 2 x,) + . . . + (&*-1 ~Vi) £ (**-i x i)-~ b v 2 ( x j X P ) = 0 

(22.98) 

But these equations are the same as those for the coefficients c lp . . . c pp with (b\ — b^ 
in place of c lp , etc., and — b p in place of c pp . Hence 

b i b x _ c w 


or 


bp Cpp 

b\ — 6 j = - ^i^S. 

c pp 


. (22.99) 
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Similarly it will be found that 




^pp 


_ c ip c *p 

C 12 °12 — -- 


'■'PP J 


with simil&r equations for the other c’s. 


( 22 . 100 ) 


22.30. Somewhat similar results apply when a variate is added, 
refer to new coefficients when x q is added, we have, as above— 


* 


b\ - b> = C A^- 


c n c n — ~r~ 


c n = 


QQ 

_ c lq C 2q 


vqq 


If primes again 


. ( 22 . 101 ) 


In order to use these equations to adjust the constants we require c[ q . . . c qq and b' q . 
By writing down the equations satisfied by c xl . . . c lp and subtracting the correspond¬ 
ing equations in c n . . . c lqt we get p equations such as 

(c n — Cn) E (^ x Xj) + . . . + (Ci P c lp ) E (Xj x p ) = c lq E (Xj x q ). 


These are the same as the equations in b x . . . b q with — c' lq E (Xj x q ) instead of E (x q y) 
on the right, and hence 





Thus, using (22.101), 

^•=- .( 22 . 102 ) 

C<22 j- 1 


The last of the equations satisfied by c qq is 

C\q {x q Xi) *4” • • • ”f* c pq E {x q Xp) 4* Cqq E {x q ) = 1. 
Substituting for c' lq , etc., in terms of c qq , we get 


c m i ^ c jk £ (%j z q ) £ (#& — 1. • • . (22.103) 

> ilk -l 2 

This gives c qqf and c lq . . . are derivable from (22.102). The other constants then 
result from (22.101). 

Cochran (1938a), to whom this proof is due, says that the elimination of two variates 
is best carried out in two stages of one each; that where one variate is eliminated the 
method is quicker than re-solving the regression equations, except where there are only 
two independent variates in the first instance ; and that if two variates are being eliminated 
the method is quicker if the original number of independent variates is six or more. For 
the addition of variates the method is in all cases more expeditious than re-solving the 
regression equations. 
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Example 22.10 (Cochran, 1938a) 

In a study of the effect of weather factors on the number of noctuid moths per night 
caught in a light-trap, regressions were worked out on Jf x (minimum night temperature), 
X t (the maximum temperature of the previous day), X, (the average speed of the wind 
during the night), and X t (the amount of rain during the night). The dependent variate 
was log (1 + n), where n was the number of moths. 

It was subsequently decided to investigate the effect of oloudiness, measured on a 
conventional scale as the percentage of starlight obscured by clouds in a night sky camera. 
This is the new variate X t . 

The quantities c }k for the first four variates were:— 

X t X t X , X 4 

JC X + 0 106,423,56 - 0 041,946,20 - 0 096,067,09 - 0 018,490,96 

X, ... +0086,038,69 + 0033,172,71 +0012,903,58 

X , ... ... + 0-572,652,01 + 0-008,116,62 

... ... ... + 0-062,275,32 

and the sums 27 (x^ x s ) were 

27 (a; x x f ) — — 4-867, 27 (x 3 x t ) = + 0-206, 27 (x 3 x s ) = — 0-5446, 

27 (x t x t ) == - 5-42, X (xi) = 7-87. 

We then find from (22.103) 

4 = + 0-210,133,14, 

and from (22.102) 

-J- 5 - = + 0-369,198,24 % = - 0-133,872,86 ^- 5 - = - 0-118,533,74 

C 55 C 65 C 55 

^ = + 0-249,298,91, 

C 55 

so that the new c’s are given by (22.101) as 



X 1 

x 3 

x 3 


X . 


x s 

X, 

0-134,066,25 

- 0-062,332,16 

- 0-105,263,03 

+ 

0-000,849,84 

+ 

0-077,580,79 

X t 

... 

+ 0-089,804,68 

+ 0-036,507,20 

+ 

0-005,890,52 

— 

0-028,131,12 

X 3 

... 

... 

+ 0-575,604,43 

+ 

0-001,907,12 

— 

0-024,907,87 

X< 

... 

... 

. . . 

+ 

0-075,335,08 

+ 

0-052,385,96 

x t 

• • . 

• • . 

. . . 


• • • 

+ 

0-210,133,14 


The original regression coefficients were 

6 X = + 0-198,140,7 b t = + 0-038,528,4 6, - - 0-508,649,2, 

6 4 = + 0-031,848,2. 

5 

We now find 6j = ^7(4 ZlPiV)} 

= - 0-227,149,6, 

and from (22.101) we then have t 

b' x = + 0-114,277,5 b t = + 0-068,937,6 b' 3 = - 0-481,724,3, 

6; 1 - 0-024,779,9. 

As usual we have retained more figures than are necessary, in order to avoid cumulating 
errors and to facilitate the detection of computational slips. 



172 


REGRESSION 


22.31. The constants c found in the foregoing method have a further use: they 
give the standard errors of the regression coefficients and provide some of the functions 
required in more exact tests based on the ^-distribution. If, measuring y about the mean, 
we have 

Y = bi Xi 4- 6, X t + . . . 4- b p X p , 
then there are p equations of the kind: 

X (*i y) —biZxl + b t X (*x *,) + ... + b p 2 (*, Xp), 

and thus, recalling the definition of the c’s, we have 


= c u Z(x t y) + c lt Z(x,y) + . . . + c ip Z (x p y). 


Thus, for fixed values of the x's, 

var &! = var 




= c u var y, 


and so for the other b’s. 

For large samples var y may be taken to be the estimated variance 


. (22.104) 


1 


,£(y-y) 2 - 


n — p — 1 

If the sample is small and it is desired to make a more accurate test, then we have, 
by an extension of 22.21, that 

t = .... (22.105) 

V Z(y — y) V c v 

is distributed in “ Student’s ” form with v — n — p — 1 degrees of freedom. 


22.32. As a final comment we may emphasise that regression equations are only 
polynomials fitted to the means of arrays, and consequently that if the scatter about 
those means is substantial they are not very reliable as estimators (though they may be 
better than other methods). The comment would hardly be necessary were it not for a 
tendency to use the equations somewhat uncritically for purposes of prediction. The 
point assumes even greater importance when attempts are made to estimate the dependent 
variate for values of the independent variates outside the range on which the regressions 
are based; or again, if the observations are distributed over time so that the population 
may be changing while the sample is being drawn. The technique of regression analysis 
is undoubtedly useful in many fields, but—as with many other statistical techniques— 
the careful investigator will apply it with a certain amount of self-discipline. 


NOTES AND REFERENCES 

The theory of curvilinear regression was studied by Karl Pearson (1905). Orthogonal 
polynomials had been considered, and the essential problems solved, by Tchebycheff as 
far back as 1857, but their use in statistics was not fully appreciated until about sixty years 
later. Pearson gave in 1921 the general formulae for fitting curved regression lines up to 
the fourth order. Neyman (1926) pointed out the elegance of the determinantal approach. 

From about 1920 onwards there may be discerned two main lines of development. 
The Sca n di na vian school, led by Wicksell, has developed the analytical theory of regression 
-■-see Wicksell (19176, 1933, 19346) and a useful memoir by W. Andersson (1932). The 
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second line, followed by Fisher, Aitken and others, has been concerned with the fitting of 
regression curves to arithmetical data and exact significance tests—see Fisher’s papers of 
19216, 19226, 19246, 1926a, a paper by Allan (1930), and three papers by Aitken (1933a, 
6, c). The literature on orthogonal polynomials is now very large. 

For some illustrative material, see K. Pearson (1905), Andersson (1932), and Pretorius 
(1930). See also references to Chapters 14 and 15. 


EXERCISES 

22.1. Show that the regression of y on the variance of x (the scedastic curve) is 
given by 

(X) D J ~* j 




where 


(Wicksell, 19346.) 

22.2. Show that if the regression of y on the mean of x is linear, then from (22.11) 


is a linear function of (f> (tj and -j— <f> (^). Hence that 

dti 

Kji K 2 o = K xl Kj+\ t o 


(Wicksell, 19346.) 


22.3. Show that if the marginal distribution of a bivariate distribution is of the 
Gram-Charlier Type A: 

/ = a (x) {1 + a 3 //, + a 4 H t + . . . } 
the regression of y on x is 

30 X 

Y _ j—o fc"-o 3 1 _ 

1 +2^ajHj(X) 

(Wicksell, 19176.) 


22.4. Transforming the orthogonal polynomials of (22.74) to a new variate 

Tt _ 1 

| = X --—, note that P p — £P p -i is a numerical multiple of P p _ 2 , say AP„_ a . Show 

h 

that 


and deduce the recurrence relation, 


Pp — fPp-i — 


X=- E ^ 

(P - ]V {n g - (P - l) 8 } p 

4 (2 p - 1) (2 p - 3) * 


(Allan, 1930. The relation is due to Tchebycheff.) 
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22.5. A regression line 

Y ** S| 4 * X "f ®i X* -f* Of X® 4 ” O 4 XT® 

is fitted to normal data and the number of observations N is large. If r is the correlation 

„'8 

between the variates and c — — (the moments referring to the 2 -variate), show that 


var 

o« 

= w (45 + 30c * 

- 8c 8 + c«) (1 - 

■r*) 

var 



- 15c* + 4c 8 ) (1 

— r*) 

var 

a* 

-^ (4 - 3c+: 

Sc*) (1 - r*) 


var 

a. 

-S* <>+«■»<» 

-'•) 


var 

a* 





(Andersson, 1932.) 


22.6. In the notation of 22.31 show that 

cov (&! 6*) = Ci t var y 

and hence show how to test the difference of two coefficients in a regression equation. 

22.7. Show how to derive a test of the significance of the difference of corresponding 
regression coefficients in two equations derived from independent samples, based on the 
result of 21.26. 
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THE ANALYSIS OF VARIANCE—(1) 


23.1. At various points in this book we have encountered in different guises the 
result that (the sum of squares of a set of observations about their mean oan be represented 
as the sum of two independent sums of squares, each of which provides an estimate of 
the parent variance; and that their ratio provides a test of homogeneity, at least when 
the parent is normal. We now proceed to study in more detail a method of statistical 
analysis with considerable generality which springs from this result.) In view of the com¬ 
plexity of the general case we shall begin by considering simpler cases under somewhat 
restrictive conditions and shall extend our results stage by stage. 

\@ne-way Classification 

23.2. Suppose we have a set of variate-values divided into p families: 


X lx 

#21 • • • 

• • *«,! 

x lt 

#22 • • • 

• • x n t i 

x w 

x 2P . . , 

• • 

• • • x n p p m 


Denoting by * the mean of the whole set and by Xj the mean of the values in the jth family, 
we have the identity 

2J ~ (*« -x,+x f - x) 2 

i,j i* i 

- X + X - *)*. • • • ( 23 . 1 ) 

if i if 7 

since the cross-product term 2 ^ — Xj) {x f — x) vanishes. We may also write this as 

ij 

JT (x (} - *)* = 2J (*« ~ ^) a + £ n i (*j - *)*. • • (23.2) 


where is the number of members in the jth family. 

It will also be convenient, from the point of view of a later generalisation, to write 
the mean of the jth family as x tj and that of the whole as the periods in the subscripts 
showing which factor is being averaged. We have then the alternative form 


(*« - *..)•= £ (*« ~ *•*)* + 2J n * ~ • • * 23 - 3 > 

i.J U i 


23.3. The problem we shall discuss in connection with families of values of this type 
takes some such form as the following : (the members of each family are randomly chosen 
from some parent population corresponding to that family. The populations themselves 
are, as a rule, defined by some prior system of classification given among the data of the 
problem,te.g. they might be different varieties of wheat, the x’s being the yields of the 
varieties grown under similar conditions, or they might be defined by income levels and 
the g’g the expenditure on food of a sample chosen from the different income groups. We 
now ask: is there any evidence that the factor measured by x varies significantly* from 

* 175 
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family to family ? Alternatively, can the data be regarded as homogeneous, i.e. as emana¬ 
ting from populations which are identical so far as concerns the faptor measured by * ?) 
Further, when the question of significance is decided, how can we estimate the variation 
of x in families or groups of families, and how can we estimate the magnitude of any 
differences which exist ? 


23.4. ^ We will assume, until further notice, that within each family the variation 
is normal with variance v, and that v is the same for each family. In later sections we 
shall endeavour to remove these rather restrictive conditions. /*>n our present hypothesis 
the populations corresponding to the different families can differ, if at all, only in their 
means, and our first question is whether the sample values afford any evidence of such 
differences. [") 

Let us take as our hypothesis that the parent populations have a common mein m. 
Then we recall the following facts :— '' 

(1) The sum - E (x tj — x..) 2 is distributed in the Type III form of x* with 


N — 1 = E (w j) — 1 degrees of freedom, that is to say as the sum of squares of N — 1 
i 

independent normal variates with zero mean and unit variance. 

(2) In any given family x tj is distributed normally with unit variance about 

mean m, and is independent of the sum ^ E (x tj — Xj) 2 which is itself distributed as % 2 


with Uj — 1 degrees of freedom. 

Since on our hypothesis the observations may be regarded as a single sample from 
the same population, it follows that 



(x {i — s..) 2 is distributed as x 2 with N — 


1 d.f. 


(*<•/ - x -i ) 2 


Z{n } - 1) =N-p d.f. 


(23.4) 


-*..)* „ „ p - 1 d.f. J 

The only statement requiring any proof is the last. It may be proved directly (see Exercise 
23.1), but we shall deduce it as the corollary of a general theorem due to R. A. Fisher which 
will often be required in this chapter. 


23.5. Suppose we have q variates x x ... which are independently and normally 
distributed with unit variance about the same mean, which we may assume to.be 
zero. Put 


£r — K’a 


r — l ... q. 


If we choose the coefficients A so that 


2K 4 -1 

=o 


r = 
r =* 


;} 


*hen each J is distributed normally with unit variance independently of the 


. (23.6) 

. ' } (23.4) 

others. There 
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are q* coefficients A, and the equations (23.6) impose \q (q + 1) conditions on them, so that 
the A’s can always be found in a multiplicity of ways. In effect they correspond to the 
rotation of orthogonal co-ordinate axes in a ^-dimensional space. 

Now suppose that we have h linear functions of the x's, £i . . . C* (A < q) whose 
coefficients obey the orthogonality relations (23.6). These h variates are then distributed 
independently, normally and with unit variance. 

It is now possible to find q — h further variates £ /t+1 . . . which are or thogon al 
am ong them selves and to f, . . . Geometrically this is evident from the possIBilltlea' 
of rotations in the q-w ay space. Algebraically it follows from the consideration that if 
qh of the A’s in (23.6) are known, q(q — h) are unknown, and the number of conditions, 
they must obey is _ __ . 

\<1 (<? + !)- ih (h±JJ - l(q-h)(q+h + 1), A , ' V -- 

so that values of the unknowns can be found in at least one way if 

i (q -I- h + 1) < q 

or h + 1 < q. 

Now suppose We express a sum of squares of q normal variates with unit variance, 
say A, as the sum of two quantities B and C ; and suppose that B is distributed as the 
sum of squares of h independent normal variates with unit variance which are linear 
functions of the variates entering into A. Then we can find q — h such variates inde¬ 
pendent of the first A, and C must be their sum of squares. Further, the distributions 
of B and G are independent. By an extension of the same argument, if 

A=A 1 +A 2 + ...+A kt . . . . (23.7) 

A is distributed as % 2 with v degrees of freedom, A x with v lf . . , A k _ x with v k ~ x ; and 
if the variates entering into . . . A k _~ x are mutually independent and are linear functions 
of those entering into A, then A k is distributed as % 2 with v k degrees of freedom, where 

V =v t + Vt + ...+»'* • • • . (23.8) 

and A k is independent of A lf . . . A k _ x . 

23.6. As an extension and kind of converse of this theorem we have the result, due 
to Cochran, that if A x . . . A k are distributed as % 2 with v x . . . v k degrees of freedom, 
and their sum A is distributed as / 2 with v = 2 (v f ) degrees, then A x . . . A k are inde¬ 
pendent. We will prove this for the case k = 2, the more general result following in a 
similar way. 

If the characteristic function of A x and A» is (f> (t u f 2 ), we have, by hypothesis, 

<f> (L. °) = 

<f> (0, «,) = 

an< ^ ^ ^ — (flTsajifi+ii;** 

Hence = (t, 0) <f> (0, t) = ( j 

and thus <f> ( t , 0) and <j> (0, t) are both divisjjfe by a factor in (1 — 2t«) -1 and no other 

A.8.—vol. nt. n 
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factor in t because of the symmetry of </> (t lf t 2 ). These factors are identified by ^ (t u 0 ) 
and <f> (0, t 2 ) as (1 — 2it)~* v% and (I — 2it)~* Vt , and hence 

^ (^i> It) = ^ (^i> 0) <f> (0» ^a)> 
or A x and A % are independent. 

23.7. Let us now return to the statements in (23.4). The sum ~ E {x ii — x tm ) 2 is 

distributed as x 2 with v = N — 1. The sum ^ E (x {j — x tj ) 2 is so distributed with 

v x tm N — p. Further, the quantities x tj — x mj may be transformed to N — p independent 
normal variates which are linear functions of the variates entering into the first sum. It 

follows from 23.5 that because of the identity (23.3) the third sum ^ En^ (x m j — a:..) 2 » 

dis tributed as x 2 with v 2 = (N — 1) — (N — p) = p — 1 degrees of freedom, and that 
independently of the second sum. 

Thus we may exhibit our break-up of the total sum in the following form :— 


TABLE 23.1 


Form of Analysis of Variance for One-way Glassification. 


Sum of Squares. 

*r- .- 

d.f. 

i 

Quotient. 

! 


Of family means about the mean of the') 
whole.J 

Znj(x.) - x..) 1 

P -i 

: - 1 —r z »/ (*./ - 

p - l i 


Of individuals in families about the! 
respective family mean . J 

JT 1 (Xif - x ./)*.' 

i,»- /\/ 

N -p 

%N-pL (Xii - 
itj 

- x.i)' 

Of individuals about the mean of the! 
whole . J 

! 

( xn -*..)* 

N - 1 

i 

i N _ l Z {xti ~ 

i,J._ ■■ 

J . - _ 

*..)* 


We qote that the sums of squares and the degrees of freedom in the first two rows sum to 
those in the third row (though the quantities in the quotient column are not additive). 
This is the origin of the expression “ analysis of variance,” though, to be accurate, it is the 
sum of squares of the total which is analysed. 

To avoid cumbrous phrases we refer to the sum of squares of family means about 
the mean of the whole as the sum of squares “ between families,” and to that of individuals 
^about the respective family-means (for the time being) as “ residual.” We shall also speak 
bf^total sum of squares and total mean with the obvious significance, and denote degrees 
of^eedom by the initial letters “d.f.”* 

23.8. Since the mean value of x 2 with v degrees of freedom is v, the quotients in 


* The need has been felt for a word to denote “ sum of squares about the mean ”. Professor 
Pitman has suggested the word “ squariance ”, though he seems to feel that this leaves something to 
be desired. In my own notes I use the word “ deviance ” but have not ventured to'introduce it into 
the text. 
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(23.1) are all unbiassed estimators of v, the parent variance. Only the first two, however, 
are independent. We recall that the ratio 


z = £ log 


N 


-p Zrij (Xj - a:..) 

- 1 I (Xij - xj 2 


2 


. (23.9) 


is distributed in Fisher’s form, which is independent of the variance v. This distribution 
accordingly provides a convenient test of significance in the normal case. 


Example 23.1 

Let us consider the application of the foregoing theory to a simple example which 
has been chosen to reduce the arithmetic to a small amount. The following shows the 
lives in hours of four batches of electric lamps :— 

Batch 1: 1600, 1610, 1650, 1680, 1700, 1720, 1800. 

Batch 2 : 1580, 1640, 1640, 1700, 1750. 

Batch 3 : 1460, 1550, 1600, 1620, 1640, 1660, 1740, 1820. 

Batch 4 : 1510, 1520, 1530, 1570, 1600, 1680. 


We know that the batches were made from four different specimens of wire, but were other¬ 
wise made under identical conditions. (This, of course, over-simplifies the problem as it 
is encountered in practice, but will serve for purposes of illustration.) The question i s, 
do the batches differ among^thems elves in le ngth of li fe ? If so, we suspect that the quality 
of wire is varying materially, and if the lamps are to be standardised as far as possible the 
quality of wire must be made more uniform from batch to batch before manufacture is 
undertaken. The numbers in this example are small, but not much smaller than would 
be desirable in practice, owing to the expense and time involved in testing a lamp by running 
it until it bums out. 

The sums of x and x 2 for the four batches will be found to be— 


Number in Sample. | Z (x) Z (x 2 ) 


Batch 1 


»» 


»» 


2 

3 

4 


5 
8 

6 


11,760 19 , 785,400 

8,310 13 , 828,100 

13,090 21 , 503,700 

9,410 ! 14 , 778,700 


Totals . 



42,570 


69 , 895,900 


Thus for the mean life of lamp in the four batches we have 11,760/7 = 1680; 
8310/5 = 1662; 13,090/8 = 1636-25; 9410/6 = 1568-33. These certainly differ, but is 
the variation such as cannot have arisen by mere sampling fluctuations ? 

We find 

X" = 42,570/26 = 1637-3077. 


Z(x i:i -xJ 2 =:Zxl-Nxl 

= 69,895,9 t 00 - 69,700,189 


— 196,711. 


Thus 
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y 


We also have 


2 n j (*.i - *..)* — Z (n, X' } ) x,, - Nx*. 
= 44,360. 


The analysis then takes the form— 


Sum of Squares. 


d.f. 

Quotient. 

Between batches.I 

44,360 

3 

14,787 

Residual.' 

! 

151,351 

22 

6,880 

Totals . j 

195,711 

25 

! 

7,828 


We have 

z = £ l°ge ~~~ “ 0*383 
2 6880 

v t = 3, v 2 = 22. 

The 6-per-cent, point for these degrees of freedom is seen from the tables to be 0-5574. 
The observed value is therefore not significant, and we conclude that, so far as this test is 
concerned, there is nothing'to throw doubt on the homogeneity of the group. 

Having decided, provisionally at least, to accept the hypothesis that the data are 
homogeneous, we may ask, what is the best estimate of the parent variance ? Our analysis 
has given three different estimates, viz. 14,787, 6880 and 7838. It seems natural to use 
the last, which depends on the greatest number of degrees of freedom. 

With this value we find for the variance of the mean of samples of n, 



7828 

n 


88-48 
yjn * 


The greatest difference of means observed is that between the first and fourth batch, 
1680 — 1568-33 = 111-67. The standard error of this difference is 

88-48 V (| + i) = 49-2. 

The observed difference is rather more than twice the standard error, but we cannot con¬ 
clude that it is significant on that account. In fact, we have picked out the greatest differ¬ 
ence for examination from the six possible comparisons of pairs, and the distribution of 
the greatest difference must have- a larger standard error than that of a difference chosen 
at random, which is what we have found. Nevertheless the fact that even the greatest 
difference is only slightly in excess of twice the standard error affords some general evidence 
in support of the hypothesis of homogeneity. 

We may also note that if a more accurate test of the difference of two means is required 
the l-test may be invoked ; but here also we must remember that we are testing the greatest 
; of a set of differences. Where there are only two families concerned, the analysis of variance 
I reduces to the l-test for the difference of sample means when variances of the parents are 
assumed equal. 


23.9. Suppose now that in the cape of one classification we have applied a test by 
means of the analysis of variance and have found that the hypothesis of homogeneity is 
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unacceptable, or, in plain English, that the parents do differ. Let u» then consider the 
alternative that the populations are still normal and that they differ in their means but 
not in their variances. 

At first sight this may seem a highly artificial assumption to make, for if the popula¬ 
tions differ in their means it is not unlikely that they may differ in other respects. This 
is undoubtedly so, but if there is serious possibility of difference in variances their homo¬ 
geneity may be discussed separately by means of tests we shall consider in Chapter 26. 
Apart from this, there often arise in practice situations in which approximate equality of 
variance is plausible on prior grounds. For instance, we may be testing the effect of 
manuring on cereal yields, and it is reasonable to suppose that if the manure exerts any 
effect at all it will increase all plants of the same variety to about the same extent—that 
it will, in fact, displace the location of the distribution of yields without affecting 
its dispersion. 


23.10. The question we have now to consider is whether we can ma^ke an estimate 
of the common variance of the populations. A little thought will show that we can. The ^ 
reasoning which led to the conclusion that the residual sum of squares is distributed as j 
v% 2 with N — p degrees of freedom remains unchanged, so that the residual quotient in / 
Table 23.1 continues to provide an estimator of v . The other two no longer do so. Con¬ 
sider, in fact, the sum of squares between families, and let the mean of the jth family be 
m mj . Then we have 


E E Uj (x mj 


- 


— E E Uj { Xj Mj — m j -f* — ?n mm } 2 

J 

— EE Uj { x mj — nij — (x 9 — w_)} 2 -I -27 nj (m m j ■- m #> ) 2 . 


(23.10) 


Here m mm is the mean y E nj mj and hence m } has the mean x — m % . Thus 

E n j{ Xj — mj — (#,. — w..)} 2 is distributed as with p — 1 degrees of freedom and 


EEnj (xj x mm ) 2 = (p - 1) v + Euj (mj - mj 2 . . . (23.11) 

i 

Not unless — m t —that is, all popu latio ns have the^game mean—does.JdiC expression 
on the right reduce to (p — 1) v, and hence the quotient between families give an unbiassed 
estimator of v. In other cases it is greater. 

Similarly, 

e E~ *••)*= E E {* (} ■- m -i ~ ~ ”*-)) 8 + fj E (m ->~ m 

= (N- l)v + Irij (mj - m..) 2 . . ' * . . . (23.12) 

i 

The expectation of the difference of the two terms considered in (23.11) and (23.12) con¬ 
firms that the residual sum of squares provides an estimator of [N — p) v. 


23.11* A comparison of the formulae we have already reached and those of section 
14.31 will show that the study of intra-class correlation is very closely related to the analysis 
of variance. It is an interesting exercise to derive the 2 -test directly from the sampling 
distribution of intra-class r given in equation (14.110) (vol. I, p. 362) and vice-versa. 


Two-way Classification 

23.12. We proceed to the case when the variate-values belong not to one of a single 
set of families but to two, say A and B. In the first instance we shall consider the situation 
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when there is only a single value in the j'th class of A and the kth class of B. 
may then be set out in the tabular form : 


Class B 



Bi ' 

1 

B* 

b 3 

• • • 

B q 

Totals 

Ai 

H 

*n ! 

*18 

*13 

. 

*1$ 

qx l. 

A 2 

1 

*.i | 

*22 

*28 

. 

*2 q 

qx 2 . 

a. 

x ai j 

j 

*32 

*33 

; 

---- 

*3 q 

qx 3 . 

A.p 

x p i | 

Xp2 


. 

X PQ 

qx P . 

Totals 

px.l | 

px m 2 

i 

■ P*. 3 

. 

P*.q 

pqx.. 


Our sample 


. (23.13) 


This is not a contingency table. The numbers x jk are variate-values, not frequencies. 
As usual, Xj signifies the mean of values in the. class Aj and x k the mean of values in the 
class B k , x mm being the mean of the whole. 

We have the algebraic identity 

( x Jk - *..)■ - £ ( x ik - x i. - *.fe + + x i. ~ x .. + x .k - *..)• 

J.k j.k 

= ^ (*» - x i. - x .k + *..)* + ^ “ X " )2 + E < X -k ~ *..)* 

j. k j. k j, k 

= E ( x ik - x f. - x .k + *J* + & ( X J. - * .)* + pZ ( x .k - *..)* (23.14) 

iTk - , * * 

the cross-prCduct terms vanishing on summation in the usual way. 


23.13. We are interested in the variation of the x’s according to class membership. 
Let us take as our hypothesis that the pq values are homogeneous, that is to say that they 
all emanate from (normal) populations with the same mean m and variance v. In such 
a case class-membership exerts no influence on variate-values, and the observed differences 
are pure sampling effects. 

The expression on the left in (23.14) is then distributed as v% 2 with pq — 1 degrees 
of freedom. The mean x jm is distributed normally with variance v/q and thus Eq — a?..) 2 

is distributed as v% 2 with p — 1 d.f. Similarly, Zp (x k — x ) a is so distributed with 

k 

q — 1 d.f. Finally the remaining term on the right is distributed as v%* with (p — 1) (q — 1) 

d.f.; for each term is normal with variance —-— -- ^ vX since 

pq 

— Xj t — X' k -fa?,. = Xj k ( 1 ~ ~ J — E Xji 

V ? P M/ i 

— 2 x mk ( ~ 2 / l 

m \p pq/ PiTm 
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so that the sum of squares of coefficients on the right is 

_ (P - 1) (9 - 1) 


(V - 1) (3 - 1) 


(M) a 


pq 


. (23.15) 


Thus, since there are p + q — 1 linear relations connecting the pq quantities 


x ik ~ x i. ~ *.k + 

their sum of squares is distributed as v %' 1 with pq — (p + q — 1) = (p — 1) (q — 1) degrees 
of freedom, which checks against the mean value of the individual square given by (23.15). 
We may thus analyse the variance in the following way 


TABLE 23.2 

Form of Analysis of Variance for Two-way Classification with One Member in each Subclass 


Sums of Squaros. 

d.f. 

i 

Between A -classes 

q£(x } . - x..V 

P ~ 1 

Between B-classes 

pS(x.k - x..Y 
k 

q - 1 


Residual . . . £ (xjk — Xj. — x.k + a\.) a ■ (p — 1) (q — 1) 

iTk 


Quotient. 


P - l j 

-£-r (*.*-*..)* 
q - 1 k 

l 

(P -D(q - i) 

X ( x Jk - Xj. - X.k + *..)* 

3,k 


Totals . 


-x..y 

i,k 



k 


LThe sums of squares and degrees of freedom (but not the quotients) are additive as 
before. It follows from the theorem of 23.6 that the three constituent sums are inde¬ 
pendent. Each quotient provides an unbiassed estimator of vfj 

23.14. Our use of these results proceeds by an easy generalisation of the method 
exemplified in Example 23.1. We take as our hypothesis the supposition that all samples 
are from normal populations with identical mean and variance. Comparison of the esti¬ 
mates in the quotient column then provides a test of significance. [If the hypothesis is 
rejected we may examine the alternative that means are different but variances identical 
throughout, in which case we shall find that the residual still provides an estimate of the 
variance, provided that an important additional assumption is made.) 


Example 23.2 

The following data (Daniels, 8upp. J.R.8.8 ., 1938,* 5, 89) show the weight in grams 
of 96-yard lengths of wool thread from 100 “ ends ” being spun on four bobbins, 25 ends 
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to the bobbin. We are interested in two factors, the variation between bobbins and the 
variation in the 25 ends on the same bobbin, according to their position. 

TABLE 23.3 

Weight in Grams of 100 95-yard Lengths of Wool Thread spun on Four Bobbins. 


End Number. 

1 

Bobbin 

2 

Number. 

3 

4 

Totals. 

1 

7-50 

7-23 

7-50 

7*53 

29*76 

2 

7-52 

7*81 

7*77 

8-05 

31*15 

3 

7-70 

7-94 

7*83 

8-16 

31*63 

4 

7*93 

7*94 

7-96 

7*76 

31*59 

5 

7*78 

7-89 

8*02 

7*85 

31*54 

6 

7-73 

8-23 

7*99 

8*14 

32-09 

7 

807 

8-27 

8-25 

8*26 

32-85 

8 

8*01 

8*54 

8-24 

8-54 

33*33 

9 

8-22 

8-24 

8-37 

8*10 

32*93 

10 

8-24 

8*35 

8*43 

8*15 

33*17 

11 

8*17 

8*29 

8-46 

8*38 

33*30 

12 

809 

i 8-54 

8 33 

8*47 

33*43 

13 

8*11 

8-45 

8-27 

8-38 

3321 

14 

7*96 

8*43 

8*24 

8-60 

33*23 

15 

809 

8*47 1 

8-12 

8-45 

33*13 

16 

804 

8-33 

8-14 

8*43 

32 94 

17 

7*78 

8*47 

8-19 

8*67 

33*01 

18 

811 

8*63 

; 8*36 

8*38 

3348 

19 

817 

i 8-31 

1 8-31 

8*16 

32*95 

20 

8-12 

i 8*31 

8-47 

8-41 

33 31 

21 

8*13 

i 8*10 

1 8-19 

8-27 

32*69 

22 

801 

801 

8-37 

7-96 

32*35 

23 

8*17 

| 7*92 

8*27 

8-08 

32*44 

24 

805 

! 8-27 

! 8-07 

8*16 

32*55 

25 

7*91 

j 7-92 

j 8-28 

8-52 

32*63 

Totals 

199-61 

204-89 

l 

j 204-43 

i 

205*76 

814*69 


It simplifies the arithmetic if we take a working mean at 8-00. 
squares about this mean is then found to be 

Z(x jk )* = 9-3829, 

and we have also 

E(x jk ) = 14-69. 

Hence 


The total sum of 


E{x ik - a..) 2 = 9-3829 - (0-1469) (14-69) 
= 7-224,939. 

The means of the four bobbins are 

7-9844, 8-1956, 8-1772, 8-2304. 


With the same working mean we find for the sum of squares 

S (**)*== 0-122,986,72; 
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and hence 

pZ(x tk - X") 2 - 25 (0 122,986,72) - (0*1469) (14*69) 

= 0*916,707. 

The means of the four ends of corresponding position on the four bobbins can, of 
course, be found from the totals in the last column of the table, but it is simpler to find 
E(qXj t — qx„Y an d then divide by q 2 . We find 

Z(x } . ~ *..)• = 4 (27 ,^ 31 - ) - (0 1469) (14-69) 

AO 

= 4*637,814. 

The continual appearance of the factor (0*1469) (14*69) — Nx* % is to be noted. The 
quantity is best computed once for all at the outset. 

The residual sum of squares is then obtainable by subtraction, and we have the 
following analysis :— 


TABLE 23.4 

Analysis of Variance for the Data of Table 23.3. 


Sums of Squares. 


d.f. 

1 

Quotient. 

Between bobbins .... 

0-916,707 

3 

0-3066 

Between ends. 

4*637,814 

24 

0-1932 

Residual. 

1-670,418 

72 

0-0232 

Totals . 

7-224,939 

99 

0-0730 


The variation between bobbins and that between ends are both significant—the ratio 
of the corresponding quotients to the residual quotient is so big in each case as hardly to 
require the z-test. We are led to suspect that the variation between bobbins, small as it 
is, cannot be a chance effect, and it looks as if bobbin number 1 is not getting its fair share 
of thread. Similarly, the weight of thread seems to be dependent on whereabouts the 
thread is spun on the bobbins, and an inspection of the original data suggests a systematic 
variation as we proceed along the bobbin from end number 1 to end number 25, with a 
possible maximum in the middle. If the manufacturing process is to be standardised as 
much as possible, we should have to examine the reasons for the shortage of weight on 
the first bobbin and for this systematic effect of position on the bobbin. 

23.15. Suppose now that, as in the example just given, the hypothesis of homo¬ 
geneity is rejected. What interpretation can we put on the residual quotient i Let us 
assume that each observation comes from a normal population with variance i>, but that 
the parent mean of the subclass A i B k is m jki these quantities varying from one subclass 
to another. Is the residual quotient an unbiassed estimator of v ? In general the answer 
is “ no ”, but there is an important class of case in which it is affirmative. 

Let be the mean of the q values of m jk in the class A jt m k that of the p values 
in B ki and the mean of the whole set of rrC s. Then we may write 

Xjk = m jk + fjfc . . . . . (23.16) 

x i. = et0 .(23.17) 
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Then 

E % (Xjk—Xj. — X .k+ X ..) 2==Z E 2 ( m jk— m j. — m .k + m .. +£#—£*. —£.*+£..)* 

r (m jk —m jt —m' k +m mm ) 2 +E E (£ jk —— £.*+£..) a > • (23.18) 

the product term vanishing as usual. The second term on the right is equal to 
(P — 1) (? ~ 1) for the f’s are distributed with variance v about zero mean, so that the 
term in question is the residual sum of squares in a p x q two-way classification of a homo¬ 
geneous sample and hence has the stated expectation. Thus we have 

E E (x jk - x u - x tk + xj 2 = E(m jk - m u - m %k + mj 2 + (p - 1) (q - 1) v. (23.19) 
The residual quotient will then provide an unbiassed estimator of v ifjmd only i£ 

m jk — — m %k + = 0. . . . . (23.20) 


23.16. Now suppose that x jk is made up of three parts which are additive, viz. 

( 1 ) the effect of the class A j9 say ; 

( 2 ) the effect of the class B ki say b k ; and 

(3) a residual which is normal and has zero mean. 

This kind of hypothesis will recur frequently. It amounts to an assumption that there 
is in x jk an element aj which affects alike all members of the class Aj but varies from one 
A-class to another ; an element b k which similarly affects alike all members of B k but varies 
from jB-class to J 8 -class; and a third component representing random variation which, 
apart from the sampling factor, is the same for all subclasses A i B k . We then have 


and 


x jk — a J + b k + £jk 
m ik = ®y + b k 

™j. = ®# + b. I 
= «. + K 
m .. = ®. + *>.. 


(23.21) 

(23.22) 


where, as usual, the subscript periods in the a’s and b’a denote averaging. Thus 

m jk ~ m j. ~ m .k + m .. = ®* +A - (®y + b .) ~ (®. + b k) + ®. + b . 

= 0 , / 

so that (23.2tf>) is satisfied and the residual quotient is an unbiassed estimator of the 
variance v. > 

Under the same conditions it will be found that 


qEZ (a,. - a?..)* = (p - 1) v + qZ (»»*. - *»..)* 

— (p - 1) v + q Z (a* - a)* . . . . (23.23) 

i 

pEZ{X' k - »J* — (g - 1) v + p Z (b k - by . . . . (23.24) 

A? & 

J5? 2T (x jk - *..)* = (pq - 1) v + £ (a, - a. + b k - by 

i>lt 4 

^(pq-Vv + qZiaj-ay+pZfa-by ]. (23.25) 

23.17. We have supposed that the component £ had a zero mean, but of course if 
all these components had the same mean, the constant common to them oould be absorbed 
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'into the functions and b k . Our hypothesis is thus a little more general than it appears. 
In certain practical cases it is a plausible hypothesis to make. For instance, in Example 
23.2 it is reasonable to suppose that the effect of a particular bobbin is the same for all 
ends, and the effect of situation the same for all bobbins. If there is any serious doubt 
on the point wc have to collect further data and consider interactions in the manner 
described later (see 23.22). 

It may, however, be noted that if the variation of the m Jk s is comparatively small 
the appearance of the term containing them in (23.19) does hot materially vitiate an estimate 
of v from the residual quotient. In any case that estimate will be greater than the unbiassed 
estimate, so that our inferences about significant differences of mean values will, properly 
interpreted, be on the safe side. 

23.18. Before going farther we may remark that the quantity we have called the 
residual sum of squares and the associated quotient are often referred to as “ error ” or 
“ interaction ” terms. The former is likely to cause misunderstanding and is better avoided 
altogether, for, as we have seen, it provides a measure of sampling variance, and there¬ 
fore of experimental error, only in particular cases. The word “ interaction ” we shall 
define below ; it has been used in different senses by different writers, and when consulting 
original memoirs the reader should endeavour to ascertain the precise meaning which 
is being attached to it—if he can. In considering a given analysis it is as well to reflect 
on the precise nature of the items covered by such expressions as “ residual ”, “ remainder ”, 
“ error ” and so forth. 


Three-way Classification 

23.19. Consider now the case when there are three classifications into A-, B- and 
C-classes. As before, we shall consider in the first place one member in each subclass 
Aj B k C h typified by x jU . We now have 

(*i*t - *...) 2 = s ( X J.. — X ..y i -i- 2 ( x .k . - z...) 2 + 2 ( x ..i - ?...) 2 

i, k, L 

T - A’ (Xj k ' Xj" X' k ' X'") 2 -f- 27 (Xjj Xj" — X'j -(- #,..)“ 

+ 27 (x M — a; tfc> — x A + a;.,,) 2 

+ A' (Xjia — Xj k ' — Xj'X — x M + Xj" + x km + X" t — #...) 2 , . (23.26) 

the summations extending over all members of the sample, pqr in number, so that we may 

replace expressions such as } {x j-m — a;,,,) 8 by qr 27 (£y.. — a-’...) 2 , etc. 

iTkfl, i 

On the usual hypothesis of normality and homogeneity we find that the first three 
terms on the right of (23.26) are distributed as vx 2 with p — 1, q — 1 and r — 1 degrees 
of freedom. The second group is so distributed with (p — 1) (q — 1), (p — 1) (r — 1) and 
(q — 1) (r — 1) degrees of freedom. The last is distributed with (p — 1) (q — 1) (r — 1) 
degrees of freedom. All but the last of these results follow from the two-way case, and 
the last may be established (.as in 23.13 or) by the consideration that for any fixed l the 
term has (p — 1) (q — 1) degrees of freedom and that there are (r — 1) independent Z’s.^ 
We may then write the analysis in the form shown in Table 23.5. (For the present 
the expression “ interacti on AB ” is to be regarded merely as a name given to a particular 
sum of squares. As before, the sums of squares and degrees of. freedom are additive, 
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and the seven items into which the total sum of squares is analysed are distributed 
independently.) 

TABLE 23.5 


Form of Analysis of Variance for Three-way Classification with One Member in each Subclass . 



Sum of Squares. 

d.f. 


Quotient. 

Between A -classes 

• i 2(xj,. - *...)* 

P - 1 



Between B-classes 

•! £(*.*. 

q — 1 


The quotient of 

Between (7-classes 

2(x.u - *...)* 

r — 1 


the sum of 

Interaction AB . 

• i Z(xjk. - *>.. - x .k. + *...)* 

(p - 1)(</ - 1) 


squares by the 

Interaction BC . 

• i 2(*.id - x.k. ~ x..i + *...)* 

(? -D(r- 1) 


corresponding 

Interaction CA . 

2{x).i - X).. - x..i + *...)» 

(»■ — 1) (p — 1) 

I 

d.f. 

Besidual . 

• ,2(*jkl - *f.. - x .k. — *../ + x ik. 

(P - 1) (gr - 1) (r - 

d! 



\ + x.kl + x ).l - X...) 2 


i 

i 


Totals . 

• I 2 (Xjid — *...)* 

, 1 

pqr — 1 

! 

i 


__ _ _ _ 


__ __ 

i 



^ 23.20. If the hypothesis of homogeneity is rejected we may consider the alternative 

represented by 

x jki ~ a j + b k + c t + Cjkh .... (23.27) 

where £, as usual, is normal with zero mean. As in 23.16 it will be found that the residual 
term in Table 23.5 has expectation (p — 1) (q — 1) (r — 1) v, and hence continues to provide 
an unbiassed estimator of v. The quotients between classes are affected like those in 
equations (23.23) to (23.25); but the interaction terms also provide estimators of v with 
the appropriate degrees of freedom. For instance, 

(%jk. - *j.. - x .k. + x...) = + h + c. + £#. - (a j + b m + c. + £,..) 

— (®. + b k + c m + £.&.) + («. +6. + C 9 -^- £...) 

-Cffc. -fi.. +C....(23.28) 

so that the expectation of the sum of squares of the a-terms is that of the £-terms, which 
we know to be (p — 1) (q — 1) v. 

23.21. This brings up a new point arising for the first time in the three-way classi¬ 
fication. If (23.27) is true, the analysis of variance will provide four different estimators 
of the variance v, namely the interactions AB, BC and CA and the residual. These are 
independent (for they depend only on the £’s, and the theory appropriate to the case of 
homogeneity continues to apply) and their ratios may be tested in the z-distribution. If 
these ratios are such as can have arisen from random sampling we may accept the hypothesis 
represented by (23.27); if not we must reject it.J'Jn short, the interaction quotients pro¬ 
vide a test of the hypothesis (23.27)0 In the two-way classification no such test is available. 

Interactions ^ 

23.22. On the hypothesis (23.27) the interaction quotients of type AB give unbiassed 
estimators of the variance v. If in any particular case these‘quotients differ significantly 
among themselves or from any other independent estimator of v, we have to reject the 
hypothesis. Apart from the normality of the variation of £, wMoh is not for the* moment 
in question, this means that we cannot represent the data as the sum of separate effects 
due to A-, J3- and C-classes, together with, a residual £ which is the same in form for all* 
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subclasses. The effects of the classes are entangled—or, as we may say, they interact. 
This is the origin of the term “ interaction ”.) 

Suppose, for instance, our data are crop-yields, and membership of the three classes 
corresponds to applications of three manures, nitrogen (A), potash (B) and phosphate (C). 
The hypothesis represented by (23.27) would then be equivalent to supposing that all three 
manures exerted an effect on yields, but that they did so independently. A given dressing 
of nitrogen would increase the yield by a j9 whatever dressings of the other fertilisers were 
applied. But it might happen that the response in yield to a i varied according to how 
much of the others were present—potash might either stimulate the effect of nitrogen or 
inhibit it. If this were so, the fertilisers would interact and the hypothesis (23.27) would . 
break down. (Significant departures from homogeneity in the interaction terms usually*/ 
# lead us to search for possible entanglements of this kind. | 

23.23. ^It must not be overlooked, however, that significant interactions, do not 
necessarily imply interaction in any real sense. They may arise from heterogeneity in 
the data. 1 To return to our example of crop-yields, suppose the yields were taken from 
a series of plots which differed materially in natural fertility. It might very well be found 
that the hypothesis (23.27) could not be justified even if the differences in yields due to 
the natural effect were partially absorbed into the coefficients a, b and c. If by chance 
the heavier dressings of fertilisers were applied to plots of greater fertility, the hypothesis 
might be shown as failing and “ significant ” interactions appear. Such points as this I 
require careful consideration in the interpretation of significance, and we shall illustrate 1 
them in some examples below. 

23.24. ^Interactions of type AB , involving two classes, are said to be of the first 
order. When considering the general w-way classification we shall see that there can 
appear interactions of second, third, fourth . . . order. In fact, the residual in Table 23.5 
is formally equivalent to an interaction of the second order, of type ABC , just as the first- 
order interaction is equivalent to the residual in the two-way analysis of Table 23.2. 

To complete the definitions, we may define the sum of squares between ^-classes as 
an interaction of order zero. The seven constituent items in Table 23.5 would then 
correspond to the following :— 

• Interaction. d.f. 


Order zero 


Order I . 


V ~ 1 
q - 1 
r - 1 

(P - 1) (q - 1) 
(q - 1 )(r ~ I) 


Order 2 


This illustrates the general symmetry of the analysis and suggests obvious generalisa¬ 
tions. j 

n-way Classifications 

23.25. For instance, with five classes A, B, C, D and E we may analyse the total 
sums of squares into 2* — 1 = 31 components. There will be = 5 interactions of 
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order zero; ^ ^ ^ = 10 interactions of first order, type AB ; ^ ^ = 10 interactions of 

second order, type ABO ; — 6 interactions of third order, type ABCD ; and one 

residual or interaction of fourth order, type ABODE. The interactions of zero, first and 
second order are of a type already familiar:— 

£(*>.... -*.) a 

-x.k... +*..)* 

Z («/«.. “ %... ~ *. id.. ~ Xj. u . + + a*... + X" h - x .) 2 . (23.29) 

The third-order interactions are typified by 

£ ( x jklm. ~* x jkl.. “ x .klm. “* x j.lm. x jk.m . + + x j..m . 

+ + * *.m. + *..Im. - ~ - * ..m. + x .) 2 • (23.30) 

and the reader will be able to write down the residual for himself. 

As usual, the 31 terms all furnish independent estimators of the variance on the 
hypothesis of homogeneity, and if this is rejected we may consider the alternative 
represented by 

x jklmn " a j + b k + C t + d m + + Cjldmn .... (23.31) 

The complete analysis in such cases may become very complex , but frequently it is sufficient 
to consider only sums of squares suggested for investigation by prior expectations./ 

Example 23.3 

The following data show the percentage water-content in a number of samples of 
a commercial product. Six samples were chosen ; each sample was tested by four different 
Operators; and each operator carried out the determination by three different methods. 
We have thus a 6 x 4 x 3 classification. 

TABLE 23.6 

Percentage Water-Content of Six Samples determined by Four Operators using Three 

Methods . 


Samples. 

— -■ 

1 

Tests. 

...... — 

— 

2 

Tests. 

Open 

itors. 

3 

Tests. 

— 

— 

4 

Tests. 

- 

1 

2 

3 

i 

2 

3 

1 

2 

3 

1 

2 

3 

1 

69 

61 

61 

57 

60 

58 

55 

58 

62 

54 

56 

59 

2 

67 

58 

60 

57 

58 

58 

61 

60 

57 

o 

56 

58 

3 

66 

67 

59 

55 

55 

56 

54 

52 

58 

53 

55 

55 

4 

60 

67 

68 

56 

57 

57 

54 

58 

56 

61 

59 

58 

5 

61 

61 

60 

59 

58 

59 

61 

57 

60 

62 

60 

60 

6 

63 

59 

60 

62 

63 

61 

64 

62 

5b 

59 

SB 

61 
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We will first of all analyse the variance systematically with rather more arithmetical 
detail than is usually required, in order to illustrate the process. 

A great deal of work is saved if we take a mean at 60. The table then becomes— 


TABLE 23.7 










Operators. 








Samples. 



1 

■ 



2 

■ 



3 

■ 



4 


Totals 


Tests. 

■ 


Tests. 

■ 


Tests. 

■ 


Tests. 



1 

2 

3 

Totals 

i 

. 

2 

- . . 

3 

Totals 

1 

2 

3 

Totals 

1 

2 

3 

Totals 


1 

-1 

■ 


1 

-3 

0 

- 2 

5 

-5 

-2 

2 

-6 

-0 

-4 

-1 

-11 

-20 

2 

-3 

-2 

0 

-5 

-3 

-2 

-2 

7 

1 

0 

— 3 

-2 

o ' 

-4 

-2 

' -0 

-20 

3 

— 5 

— 3 

-1 

-9 

-5 

-5 

-4 

14 

-0 

-8 

-2 

—16 

n 

-6 

-5 

: - i 7 

-50 

4 

0 

-3 

_2 

-5 

-4 

-3 

-3 


— 0 

-2 

-5 

-13 

tfl 

-1 

-2 

— 2 

-30 

5 

1 

1 

0 

2 

-1 

-2 

-X 

4 


n 



2 

0 

0 

2 

-2 

6 

3 

-1 

0 

2 

2 

3 

1 

6 

n 

2 



-1 

0 

1 

0 

13 

Totals 

-5 

— 7 

-2 

1 

i : 14 

-14 

-0 

-11 

- 34 

F 

-13 

-9 

-33 

-11 

-14 

! -9 

-34 

-115 


We have shown the totals of the tests for each operator, of the tests for all operators, and 
of samples for each test. 

We now form three two-way tables from this by adding the values of one of the 
variates, e.g.— 


TABLE 23.8 

Operators. 



1 

2 

3 

i 

4 

Totals. 

1 

1 

- 5 

■ 

- 6 

- 11 

- 20 

2 

- 5 

- 7 

- 2 

- 6 

- 20 

3 

- 9 

- 14 

- 16 

- 17 

- 56 

4 

- 5 

- 10 

. 

- 13 

- 2 

- 30 

5 

2 

- 4 

- 2 

2 

~ 2 

6 

2 

6 

5 

0 

13 

Totals 

- 14 

- 34 

- 33 

i 

1 

i 00 

! 
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Samples. 


TABLE 23.9 
Tests. 



1 

2 

3 

Totals. 

i 

1 

1 

i 

- 15 

- 5 

0 

- 20 

2 

- 5 

- 8 

- 7 

- 20 

3 

- 23 

- 21 

- 12 

- 56 

4 

- 9 

- 9 

- 12 

- 30 

5 

3 

- 4 

- 1 

2 

6 

8 

4 

1 

13 

Totals 

- 41 

- 43 

- 31 

- 115 


TABLE 23.10 

Operators. 


* 


1 

i~.' i 

2 ! 

3 1 

S 4 

Totals. : 


1 

- 5 

- 14 

-n | 

- 11 

- 41 

Tests. 

2 

- 7 

- 9 . 

- 13 | 

- 14 

- 43 

3 

- 2 

- 11 ! 

i 

- 9 i 

1 - 9 

1 

- 31 

1 

i 

Totals 

- 14 

i 

- 34 

- 33 

i 

i 

| - 34 

- 115 


As we have inserted the totals of various kinds in Table 23.7 these subsidiary tables 
can be picked out at once ; but in general, totals are not available in the original (and for 
four-way classifications it is difficult to find a form of tabular presentation which will permit 
of their insertion) so that the tables have to be separately compiled. In practice I find it 
convenient to do so in any case to avoid picking out the wrong figures in the original table. 

Pursuing the condensation process, we should now derive three one-way tables from 
Tables 23.8 to 23.10, but in fact the row and column totals already give us what is required 
(and incidentally provide a check on the arithmetic). 

Now we proceed to find the various sums of squares. For the total of all observations 
we find — 115, and for the sum of squares of observations 653. Thus 

X... = — i- = - 1-597,222 

.Me?.. = - 115*,. = 183-680,556 

27 (x m - *...)• = I (x m Y - Nx*" 

= 653 - 183-680,556 
= 469-319,444 

with 6x4x3 — 1=71 degrees of freedom. 


. (23.32) 
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For the interactions of order zero we require the sums of type 

£(*{.. -*..)*= ^ (*,..)*“ 

where summation takes place over the N values. It is, however, unnecessary to work out 
the means x jmu . Consider, for example, the sum of squares between samples. From the 
totals of Table 22.8 or Table 22.9 we find (j denoting samples)— 


Z(l2x jta ) 2 = (- 20) 2 + (- 20) 2 + . . . + 13 2 
= 5009, 


where the summation is over six values only. Thus, for summation over the 72 values- 

^ 5009 = 417-416,667. 

Hence 

Z(x U ' J* = 417*416,667 - 183*680,556 

= 233*736,111 . 

with 6 — 1=5 d.f. 

Similarly (k denoting operators) we find— 

•£(*.*. - *...)* = - 183-680,556 


16-152,778 


with 3 d.f. ; and (Z denoting tests)- 


4.4.01 

E(x mml - x, J 2 = - 183*680,556 

= 3*444,444 


(23.33) 


(23.34) 


(23.35) 


with two degrees of freedom. 

Now we require first-order interactions. We have (summation being over the N 
values)— 


2 ( x ik. - x u . - x .k. + * J 2 = Z (**. - *...)* -I- Z (*/.. - x ...) 2 

+ E (*.*. X"') 2 2E (Xjic. #...) (Xj.. #...) 

-21 (x jk — x ) (x k — x ) 

= 2 tot. - *. J 2 ~ * to.. ~ *. J 2 - * tot. ~ *...) 2 ( 23 * 36 ) 

and thus the first-order interaction term is ascertainable from 2J (x jk% ) 2 and quantities which 
have already been computed. 

From the body of Table 23.8 (remembering that summation relates to 72 values and 
hence that each value in the table is counted 3 times) we find 

Z(x jle .) 2 - ~ O 2 -H (- 5)* + • . •} = Y 9 


= 499*666,667. 

The interaction term is then 


499*666,667 - 183*680,556 - 233*736,111 - 16*152,778 = 66*097,222 . (23.37) 
with (6 - 1) (4 - 1) = 15 d.f. * 

Similarly in the body of Table 23.9 we find for the sum of squares 1915. Hence the 
interaction of samples and tests is 

- 183-680,556 - 233-736,111 - 3-444,444 = 57-888,889. . (23.38) 


A.S.—VOL. II. 


O 
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In the body of Table 23.10 the sum of squares is 1245. Hence the interaction of tests 
and operators is 

- 1 — - 183-680,556 - 16-152,778 - 3-444,444 = 4-222,222. . (23.39) 

6 

Finally, the residual is given by the difference of the total sum of squares and the 
interactions already found, namely by 

469-319,444 - 233-736,111 - 16-152,778 - 3-444,444 - 66-097,222 - 57-888,‘889 

- 4-222,222 = 87-777,778 . . . (23.40) 

with (6 — 1) (4 — 1) (3 — 1) = 30 degrees of freedom. 

We can now make up the table of variance analysis as follows :— 

TABLE 23.11 


Analysis of Variance of Data of Table 23.7. 


Sum of Squares. 


d.f. 

Quotient. 

Between samples (S) . ; 

233-736 

5 

46-747 

„ operators (O) . . . ! 

16-163 

3 

5-384 

„ tests (T). 

3-444 

2 

1-722 

Interaction SO . 

66-097 

15 

4-406 

„ OT .1 

4-222 

6 

0-704 | 

„ ST . 

57-889 

10 

5-789 j 

Residual. 

87-778 

30 

2-926 i 

1 

| 

Totals . 

469-319 

! 71 



We proceed to discuss the data in the light of this analysis. 

The most striking feature of the table is the size of the quotient between samples. 


The variance ratio here is 


46-747 

1^926 


15-976, with a corresponding value of z equal to 1-38. 


For v l = 5, Vj = 30 the 0-1-per-cent, point is 0-8554, and the ratio is highly significant. 

We remark in passing on a point, which will be taken up later. The ordinary z-test 
gives the probabilities that the ratio of two variances chosen at random does not exceed 
a given value. But in this case we have deliberately picked out the largest quotient for 
one of our estimates. If z had fallen at the 5-per-cent, level we could not have argued that 
the odds were 19 to 1 against the event. They are very much less, since we have deliber¬ 
ately chosen the largest value for comparison with the residual. However, in the present 
case our probability is so small that we can confidently assume the significance of z (see 
23.27 below). 

Our first inference, then, is that the whole sample is not homogeneous. There appear 
to be variations from sample to sample which are not assignable to differences between 
testB or operators, and if we wished to standardise our prbduct with greater accuracy we 
should be led to examine the manufacturing process. This conclusion is, however, subject 
to a point which we discuss in the next example. 

Having rejected the hypothesis of homogeneity we are now faced with the question 
whether the other quotients in Table 23.11 can be compared so as to assess the relative* 








n-WAY CLASSIFICATIONS 


195 


variability of the other factors. We must then take a new hypothesis, and we will suppose 
that the variable may be written 

x jki = a j + £jkl> ..... (23.41) 


where a j is an unknown quantity expressing the accepted variation between samples. 
Unless there is something very peculiar about the tests or operators it is reasonable to 
suppose that the variation between samples can be isolated in this way. IWe will now 
suppose that the f’s, not the x’s, are distributed normally with common mean and variance v.) 

If the values given by (23.41) are substituted in the various constituent items of Table 
23.5, it will be found that except for the variation between samples all the other sums of 
squares assume the same form with £ written instead of x . This, of course, follows from 
23.20 of which our present hypothesis is a particular case. On the hypothesis of (23.41) 
we are thus enabled to compare the quotients in the table in the usual way. The element 
of variation between samples has, so to speak, been abstracted from the discussion. 

We then turn to the sum of squares between operators in Table 23.11. The variance 

ratio is — 1-84. For v x — 3, r 2 — 30 this is not significant. Similarly, for the sum 

1-722 

of squares between tests we find a ratio of ---, again not significant. Provisionally we 

conclude that there is no evidence of variation between operators and tests, apart from 
pure sampling effects. 

Now we have to consider the interactions. For that of SO we have the variance ratio 


4-406 

------ — 1-51, which is not significant. We find the same for the interaction ST. For 

OT we have (taking the larger variance as the numerator) 


z = h lo Se 


2-926 

0-703 


0-713, 


r, — 30, v 2 — 6. 


This value is just beyond the 5 per cent, point and, judged by itself, might have been regarded 
as significant; but taken in conjunction with the others it may, perhaps, be accepted as 
a permissible sampling fluctuation. 

To sum up, therefore, the only evidence of deviation from homogeneity appears in the 
sample-differences, and we see no reason to reject the hypothesis represented by (23.41). 
Since all the other items in the analysis, apart from that between samples, are homo¬ 
geneous, we could condense the table into the form — 


Sum of Squares. 

d.f. 

i 

Between samples . ; 

233-736 

5 

Remainder.: 

235-583 

66 

Totals . ! 

i 

469-319 

! 

i 

71 

! 


Quotient. 

46-747 

3-569 


The reader may wonder why, in carrying out the tests of significance, we have through¬ 
out used the residual quotient as the denominator of the variance ratio, and not, for instance, 
one of the interactions. There are two reasons. First, the residual has more degrees of 
freedom, so that it is preferable notwithstanding that the 2 -test is valid for any number 
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of degrees of freedom. Second, the residual is not so likely to be affected by interactions 
which, though not emerging into significance, might nevertheless exist. But once we have 
established that an interaction is not significant, there is no reason why it should not be 
amalgamated with the residual, as in the table on page 195. 

Example 23.4 

There is a-point of great importance concerning the inference from analyses of variance, 
which we will illustrate by an imaginary example based on the data we have just con¬ 
sidered. Suppose our analysis of variance were of the following form :— 


Sum of Squares. 

d.f. 

Quotient. 

Between samples .... 

125 

5 

25 

Between operators .... 

60 

3 

20 

Interaction SO . 

150 

15 

10 

Remainder. 

48 

i 

48 

1 

! 

Totals'. 

i 

383 i 

1 

71 _ 



We will suppose that the sums of squares between tests and the other first-order inter¬ 
actions are not significant, so that they can be amalgamated with the residual to give a 
remainder with 48 degrees of freedom as shown. 

On this evidence the sums of squares between samples and between tests are both 
significant, as also is the interaction SO. What inference can be drawn about the varia¬ 
bility of the product from one sample to another ? We know that the readings differ 
significantly ; but may not this difference itself be due to the demonstrated variation 
between operators, or does it really exist ? Is there in fact any variability in the water- 
content of the product, apart from the sampling effect in homogeneous variation ? 

The significance of the SO interaction means that we cannot now regard the effects 
of operator and sample as independent. We must consider the possibility of entanglement. 
This is not the only explanation—there may be some other specific cause of variation 
present which we have not thought of, and on which our present data throw no light. But 
in this case there is some prior possibility that samples and operators are “ entangled ” or 
interacting in the ordinary sense. An operator may be getting better results from his 
material when it has high water-content than in the reverse case ; or, knowing that the 
mean content is near 60 per cent, he may unconsciously (or even consciously) bring his 
determinations nearer to that figure and hence reduce their spread. 

In a case of this kind, and indeed in all statistical inquiries, it is important to have 
a clear idea of the question which is being asked and of the population to which it relates. 
We have had a number of samples and have tested them by four operators each using 
three tests. So far as we can see, the tests are equivalent but the operators are not. All 
the same, we are not very interested in the variation among operators (unless this is 
an experiment in psychology and not in chemistry). What we want to know is whether 
the water-content varies in reality, that is to say as the average of a large number of 
determinations by different operators. Our particular four are themselves samples of 
a population of operators. 
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If we confine our attention to the four operators and suppose that each has a specific 
reaction to particular samples m ik , so that 

Xjk — mj k + fyk • • • • • (23.42) 

where £ is a normal random residual with variance v for all j, k , then in the usual 
way we find 

E E (Xj k — Xj. — x tk + X ") 2 = (p — 1) (q — 1) v + E (m jk — — m k + m ..) 2 • (23.43) 

But suppose we consider the matter from a different viewpoint. Regard m ik as itself 
chosen at random from a normal population of operators with variance v'. Then, taking 
expectations of this population in addition, we find from (23.43) 


EE(x jk - x jt - * * + xj 2 - (p - 1) (q - 1) (v v'). . . (23.44) 

Thus the interaction term provides an unbiassed estimator of the variance v + v' of x ik . 
By “ unbiassed ” in this connection we mean that the average over all determinations and 
all operators will give the variance of x jk in the population of all determinations and all 
operators. 

Similarly we shall have, on the same interpretation, 


E E (xj 9 — x mm ) 2 = (p — 1) (v + v')\ 
EE(X' k -xJ 2 = (q~- l)(v +v')] • 


. (23.45) 


and hence the ratio of either interaction of zero order to the first-order interaction may be 
tested for homogeneity. Our analysis then becomes— 


Sum of Squares. 


d.f. 

■ Quotient, 

i Between samples .... 

125 

i 

5 

25 

, Between operators .... 

fiO 

3 

20 

| Residual (SO) . 

150 

15 

10 

Totals . 

335 

23 



Neither ratio is now significant. For the sum of squares between samples we have 
a ratio of 2-5, v x — 5, v 2 — 15, which is below the 5 per cent, point. 

Thus we should conclude that, regarding the data as a member of possible samples from 
all possible operators, there is little or no evidence of real variation from sample to sample. 
This is quite consistent with the inference we drew at the beginning of the example as to 
the “ significance ” of the terms concerned, though at first sight it appears directly 
contradictory. In the first case we inferred that for these four operators there were signifi¬ 
cant differences in their determinations for the samples, so that sample-differences are 
“ real ” in the sense that they cannot be attributed solely to random variation in homo¬ 
geneous material. In the second case we enlarge the domain by considering operators as 
subject to “ error ” in the sense that one human being differs from another, and find that 
sample-differences can now be ascribed to variation in the population of operators. 

No further emphasis is needed on the care necessary for the proper interpretation of 
the results of an analysis of variance. The nature of the population which is being con¬ 
sidered should be brought explicitly to mind in every case ; and the reader should form 
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the habit of asking himself, whenever a result is found to be “ significant ” : significant 
of what ? 


Arithmetic of Variance Analysis 

23.26. Before considering further examples we will dispose of a few points arising 
from the calculation of the constituent sums of squares and the application of the 2 -test 
in determining the significance of variance-ratios. 

The calculation of sums of squares for an w-way classification can very conveniently 
be carried out by the use of a punched-card system when the data are numerous, and some 
remarkable computing feats have been performed by this technique. For ordinary labora¬ 
tory work with a machine, the process of Example 23.3 is possibly the best, though some 
modifications may be made to suit individual taste. 

The main work lies in computing the total sum of squares. This is done by finding 
the sum of squares of observations from the original data (with a convenient working 
mean) and the sum of observations obtained at the same time. The formula 

- * ..)* =■ Z Xjki - Nx ‘i. 

— X— X tii ^ vCj/ci * • • • (23.46) 

then gives the total sum required. The quantity Nx * >f is constantly needed and should 
be recorded. It is useful to preserve a few more decimal places than will ultimately be 
used in the final presentation of the analysis. 

The original data are then condensed into n (n — l)-way tables by summing over 
each class in turn. In Example 23.3 this was done so as to give three tables : Operators- 
Samples, Tests-Samples and Operators-Tests. The main body of these tables gives means 

of the type x jkm multiplied by a constant factor. A further condensation will give 

sets of means of type m ; and so on, as far as is required. 

From the condensed tables we can then determine the sums of squares of means of 
various orders, and hence the interactions. The main pitfall lies in the way of the applica¬ 
tion of the correct multipliers and divisors -it has to be borne in mind that the summation 
takes place over all values of the sample. 

Suppose, for example, we have a four-way classification into classes with p , q> r and s 
numbers of members. The first condensation gives us four tables of which a typical one 
is p x q x r, based on the sum of s members. The next condensation gives us six two-way 
tables typified by p x q, based on the sum of rs members. The third gives us four one¬ 
way tables such as p, based on qrs members. Consider the variance between p-classes :— 

— a;....) 2 =Ia:| i#t — Nx * 9 .... (23.47) 

In the condensed one-way table of p classes each term is to be counted qrs times, and 
thus, if S is the sum of squares in this table as it stands, 



S 




Thus, summing over all members, we find 


z *<- ~ ^V)> 

qrs' 


whence (23.47) gives the zero-order interaction for p-classes. 


Similarly for q , r 


. (23.48) 
and 8 . 



USE OF THE 2-TEST FOR SEVERAL VARIANCE-RATIOS 


199 


For the first-order interaction we have 
£ ( x Jk " — Xj" % — X' k " + a; >>t- ) a 

= £(x Jk „'- x. ...)• - - * . J* - 2’ (*.*.. - a?....)* . . (23.49) 

The last two terms on the right have already been found. We require 

• • • -(23.50) 

If /S' is the sum of squares of elements in the body of the two-waly table found by adding 
r- and .v-items, we find 

Safe. - ,.(23.51) 

and so on. The general process will now be clear. 

Unfortunately there is no convenient independent check on the calculations. The 
various condensed tables are self-checking since their totals are the sum of all observations, 
but the sums of squares do not check with anything. It is, of course, possible to evaluate 
each individual term in the residual and to check by summing squares, but this is too 
laborious for use except in the simplest cases. 

Use of the z-test for Several Variance-ratios 

23.27. In the complete analysis of n classes there are 2 W - 1 elements, and the 
number of variance ratios arising for test may be considerable. The z-test gives the proba¬ 
bility that a particular value chosen at random will be exceeded. If therefore we pick 
out the largest ratios for test, the chance that one of them is “ significant ” in the sense 
of exceeding the 1()0/ J -per-cent, point is a good deal greater than P , and we run into the 
danger of attributing significance to what may be a pure sampling effect. 

Suppose we make r different and independent tests of r values of z. The chance that 
each does not exceed a fixed value (depending on the number of degrees of freedom) is 
1 P, where P is some assigned level of significance. Hence the chance that none of 
them exceeds its’appropriate value is 

(1 P) r — 1 — rP, approximately, . . . (23.52) 

provided that P and rP are small. For instance, if P = 0*01 and r = 7 the probability 
that no z exceeds its appropriate significance value is 0-93, and thus there is a probability 
of 0*07 that at least one of them will do so. 

In practice the problem of numerous comparisons is more complicated because they 
are not independent. In such circumstances our judgment of significance has to incor¬ 
porate an element of the intuitive. However, if all the comparisons are based on the 
common residual quotient it is possible to find the probabilities that the largest of r values 
exceeds assigned values. The resulting expressions are complicated, even when all the 
sums of squares have the same degrees of freedom, but reference may be made to Hartley 
(1938) for approximations and to Cochran (1941) and Finney (1941a) for exact expressions. 
The conclusion reached by Finney is that if the degrees of freedom in the residual are 
sufficiently numerous the ratios may be treated as completely independent. 

23.28. There is a particular case of the n -way classification which is worth special 
mention, namely, that for which each classification is a simple dichotomy, so that there 
are 2 n subgroups. This case arises frequently when so-called “ factorial ” experiments 
are being conducted to determine the effect of a treatment which is either applied or with- 
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held. The analysis of variance remains the same in principle, but of course the arithmetic 
becomes a good deal simpler. 

Example 23.5 (F. Yates, Supp. J.R.S.S., 1935, 2, 181) 

An area of ground was sown with peas and divided into 24 plots in the manner shown 
in Table 23.12. The plots received, or did not receive, dressings of nitrogen (N), phosphate 
(P) and potash (K) in the manner shown, the yields in pounds being given in the table. 

TABLE 23.12 


Yields of Peas and Manurial Treatments on 24 Plots 



There is some purpose here in the alternation of treatments, but that need not concern us 
for the present. We have 24 observations in four classes, viz. blocks (3), nitrogen (2), 
phosphate (2) and potash (2), giving 3x2x2x2=24 records. 

Condensing the table by adding blocks we get the following :— 

No treatment N P K NP NK PK NPK Total 

164-3 191-3 163-0 1500 173-8 164-0 151-5 163-1 1317-0 

Condensing according to the three treatments we have— 







USE OF THE z-TEST FOR SEVERAL VARIANCE-RATIOS 201 



N 

not-JV 

Totals 

K 

327*1 

307-5 

634*6 

not-jfiC 

365*1 

317-3 

682*4 

Totals 

692*2 

624*8 

1317*0 


We omit the remaining calculations. The analysis in its final form is given in 
Table 23.13. 

TABLE 23.13 


Analysis of Variance of the Data of Table 23.12 


Sums of Squares. 

d.f. 

Quotient. 

Between blocks ( B) . 



i 

j 

• i 

177-803 

2 

88*90 

„ N . . . 




189*282 

1 

189*28 

„ P . . . 



• i 

8*402 

1 

8*40 

„ K . . . 



. ! 

95*202 

1 

95*20 

Interaction BN . 



i 

. ! 

94*255 

2 

47*13 

„ BP . . 



. j 

2*260 

2 

1-13 

„ BK . . 




23*685 

2 

11*84 

„ NP . . 




21*281 

1 

21*28 

„ NK . . 



1 

. i 

33*134 

i ; 

; 33*13 

„ PK . . 



1 

0*481 

i 

0*48 

„ BNP 



. ; 

25*302 

2 

12*65 

„ BNK . 




36*004 

! 2 

18*00 

„ BPK . 



. ! 

3*782 

2 

! 1*89 

„ NPK 



. 

37*003 

1 

i 37*00 

Residual ( BNPK) . 



* j 

128*489 

2 

64*24 

1 

Totals . 


. 


876*365 

23 



We have carried out the analysis in full so as to illustrate the arithmetical process 
for a four-way classification, but we may note at once that it is unduly elaborate. There 
are only 24 observations in the data and we cannot expect them to provide all the answers 
to the questions which we could frame as to the significance of the various constituent 
items in the analysis. This is borne out by the z-test. The residual variance is 64*24 
with two degrees of freedom. For v 1 — 1, v 2 = 2 the variance ratio at the 1-per-cent, 
point is 98:49 and that for v 1 = 2, v z = 2 at the same point is 99*00. Only values greater 
than about 100 times 64*24 or less than 1/100th of that value would thus be significant. 
Only the interaction PK falls outside this range, and even this, among so many, can hardly 
be regarded as significant. 

The inquiry is not, however, completely frustrated. Since the second-order inter¬ 
actions are not significant, we amalgamate them with the residual to give a remainder 
sum of squares of 230*580 with nine d.f. and a quotient of 25*62. It will now be found 













202 


THE ANALYSIS OF VARIANCE 


that among the first-order interactions only two are significant, PK and BP being too 
small. Had they been too large we might have attributed some genuine significance to 
this result, but it is not very plausible to suppose that there is a “ real ” interaction between 
blocks and phosphate, or that phosphate and potash inhibit each other’s action. The 
differences from expectation are more probably due to individual soil variation from plot 
to plot. 

If we accept the first-order interactions as not significant, we may amalgamate them 
with the remainder to give the following :— 


Sum of Squares. 


Blocks . 

N . . . 

P . . . 

K . . . 

Remainder 


Totals 



d.f. 

Quotient. j 

f 

177-803 

2 

88-90 

189-282 

i l 

189*28 

8-402 

1 

8*40 

95-202 

1 

95-20 1 

405-676 

18 

22-54 

876-365 

„“i 



Here the P-quotient is not significant, but the variance ratio for blocks, 3-99, is near the 
5-per-cent, point. The V-quotient will be found to be significant at the 1-per-cent, point, 
the if-quotient near to the 5-per-cent, point. Our conclusion is that there is strong indica¬ 
tion that nitrogen influenced the yield, some indication that potash did so, and little indica¬ 
tion that phosphates did so ; and that there is ground for suspecting heterogeneity in the 
soil partly because of the difference between blocks and partly from some of the first-order 
interactions. 

In this case, of course, we knew already more or less what was to be expected of these 
data and are the readier to accept the conclusions on that account. Had we known nothing 
of the effect of fertilisers on leguminous crops our conclusions on such slender evidence 
must have been very tentative indeed, particularly if we wished to extend them to peas 
grown on other soils under different climatic conditions with different amounts of fertiliser. 


Example 23.6 (C. E. Gould and W. M. Hampton, Supp. J.R.8.8., 1936, 3, 137) 

In the manufacture of optical glass there appear small bubbles known as “seed”, 
which constitute a defect. The glass is made in “ pots ” which take about a year to pre¬ 
pare, and are run continuously over long periods when once started. There are two pots 
to a furnace and materials are introduced into a pot from time to time which, after fusion, 
provide a “ run ” of glass. Each run provides several days’ work, one day’s work being 
known as a “ journey ”. At each journey quantities of glass are drawn from the pot and 
blown into “ cylinders ”, there being about 18 or 20 to the journey. For the purposes of 
the experiment three cylinders were chosen, the third, tenth and sixteenth, and pieces of 
regular size cut from them for examination as to frequency of seed. The first five journeys 
of each of five runs were sampled. 

We have here a four-way classification 2 (pots) x 5 (runs per pot) x 5 (journeys per 
run per pot) x 3 (cylinders per journey per run per pot). The actual dates of the runs 
were February 16th, May 23rd, June 12th, September 1st and December 6th, so that the 
manufacturing period covered about ten months. We shall assume that the glass was 
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of the same type throughout, although in actual fact it was different in one or two cases 
—but not sufficiently different to affect the analysis. 

The topic of main interest here is whether the frequency of seed varies significantly 
according to the four factors concerned. If so, the alteration of manufacturing conditions 
may improve the wastage due to seed ; but if not—and the variation is the kind of thing 
which can be accounted for as chance fluctuation in sampling from a homogeneous popula¬ 
tion—there is little hope of improvement except perhaps by a radical alteration in the 
process affecting all pots, runs and journeys alike. 


TABLE 23.14 

Frequency of “ Seed ” in Samples of Glass 

Tot 1. Pot 2. 





I Cyl. 1. 

Cyl. 2. ; 

! 

Cyl. 3. 

Cyl. 1. 

Cyl. 2. 

Cyl. 3. 


J l 


. 1 47 

56 

100 

52 

61 

88 1 


2 


. : 55 

89 

93 

49 

62 

97 ! 

Hun 1< 

3 


. i 35 

57 

56 

34 

60 

72 ! 


4 


78 

67 

113 

47 

93 

118 


5 


33 

' 40 

128 

16 

29 

130 ; 


\J 1 


52 

66 

36 

65 

80 

40 ! 


2 


21 

61 

49 

122 

97 

79 ! 

Run 2< 

3 


31 

39 

25 

45 

54 

72 


4 


43 

72 

52 

109 

120 

80 


5 


37 

51 

67 

67 

85 

63 


1 


50 

61 

60 

75 

139 

130 


2 


33 

27 

49 

46 

58 

63 

Hun 3< 

3 


24 

39 

i 24 

15 

33 

39 | 


4 


18 

18 

; 43 

22 

16 

19 | 


5 


. : 28 

42 

28 

27 

19 

22 | 


rJ 1 


. ; 24 

34 

43 

46 

66 

24 j 


2 


. i 24 

49 

i 42 

40 

117 

105 ! 

Hun 4^ 

3 


21 

21 

1 51 

30 

28 

34 


4 


21 

! 69 

! 48 

36 

64 

53 1 


. 5 


. : 70 

48 

42 

39 

60 

78 


J 1 


. | 31 

! 

54 

40 

19 

93 

36 


2 


. 1 34 

24 

46 

16 

; 12 

2 

Run 5< 

3 


. : 120 

122 

120 

33 

58 

107 


4 


. I 109 

119 

120 

25 

63 

90 


5 


. \ 69 

! 49 

j 60 

34 

i 43 

30 




1 


i 

_ . _ 

i 



Before plunging into the analysis of variance it is as well to look over the data to see 
whether they themselves suggest any lines of inquiry. We observe considerable varia¬ 
bility from journey to journey within the same run, J 3 and J4 of run 5 being conspicuous 
in pot 1 ; and in run 1 the numbers of seed appear to increase from cylinder 1 to cylinder 3 
in a rather exceptional way. The runs themselves seem to differ materially. Prior con- 
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siderations also suggested an examination of the way in which frequency of seed varied 
between pots, since they were chosen so as to differ substantially in constitution. 

A complete analysis of variance of the data is as follows:— 


TABLE 23.15 


Analysis of Variance of the Data of Table 23.14. 

\ 

Sums of Squares. 

d.f. 

Quotient 

Between pots (P) .... 

898 

1 

898 

„ runs ( R) .... 

14,059 

4 

3,515 

„ journeys («/)... 

4,355 

4 

1,089 

„ cylinders (0) 

10,631 

2 

5,315 

Interaction PR . 

16,133 

4 

4,033 

„ PJ . 

4,081 

4 

1,020 

„ PC . 

587 

2 

293 

„ RJ . 

45,934 

16 

2,871 

„ RC . 

11,626 

8 

1,453 

„ JC . 

2,540 

8 

317 

„ PRJ . 

0,711 

16 

607 

„ RJC . 

12,472 

32 

390 

„ JCP . 

! 1,656 

8 

207 

„ CPR . 

| 1,862 

8 

233 

Residual (PRJC) .... 

! 8,110 

1 .J 

32 

253 

Totals . 

1 

144,655 

1 

149 

j 


The second-order interactions will be found non significant, so we amalgamate with 
the residual, giving a sum of squares 33,811, d.f. 96, quotient 352. 

It then appears that of the first-order interactions PR, RJ and RC are significant and 
PJ may be so. There is beginning to appear evidence of heterogeneity, and that of a rather 
complicated kind. It seems that pots are interacting with runs, runs with journeys and 
runs with cylinders. 

Taking 352 as the quotient, we find that except for P the zero-order interactions are 
significant. The five JR-means are 68*50, 62*67, 42*23, 47*77 and 59*27, so that the variation 
of runs is not a simple rise or fall, which could have been explained as a time-effect. The 
five J-means are 58*93, 55*37, 49*97, 64*83 and 51*33, again not a regular effect. The 
C-means are 44*46, 59*68 and 64*12, which are significantly different. Inspection of the 
table suggests that the first run is the source of the trouble. 

With data as heterogeneous as these it is rather difficult to set up a plausible hypothesis 
to test. The interactions of first order suggest that no simple additive effects of the four 
factors will explain observation, and if these terms are used as denominators in tests of 
variance ratios the variation between classes appears on the whole non-significant on the 
usual hypotheses. The analysis, then, suggests several subjects for inquiry as concerns 
the homogeneity of the data, but does not suggest any simple explanation of the observed 
figures. The reader may care to refer to the original paper for a more complete discussion 
of the subject. 
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23 . 29 . Perhaps we may pause at this point to review progress. We have seen 
that for an n -way classification of the special type wherein each subclass contains a single 
member, the sum of squares of all observations about their mean can be exhibited as the 
sum of a number of such sums. On the hypothesis of normality and homogeneity each 
constituent sum of squares, on division by its appropriate number of degrees of freedom, 
gives an estimator of the parent variance, and each is distributed as % 2 independently of 
the others. The hypothesis of homogeneity can then be tested in Fisher's 2 -distribution, 
subject to the adoption of a conservative attitude where many tests are made on the same 
data. If the hypothesis is rejected we may replace it by a simple form in which the effects 
of the different classes are additive, provided that the interactions are not significant. 
The particular ratio chosen for a test depends on the hypothesis concerned, and it is import¬ 
ant to have a clear idea of the exact question to which an answer is sought. 

23 . 30 . In the next chapter we shall consider the case when the numbers in different 
subclasses are not equal, discuss the additive hypothesis in more detail, examine the relation¬ 
ship of variance- and regression-analysis, and extend our results to the analysis of covariance. 
We conclude this chapter by an examination of the important question: what can be 
done with the analysis of variance when the variation is not normal ? 

Non-normal Data 

23 . 31 . The analysis of a sum of squares into its constituent sums can, of course, be 
undertaken in all circumstances, but the various quotients may not continue to provide 
unbiassed estimators of the parent variance if the population is not-normal. What is 
equally serious, the constituent sums of squares may not be distributed independently. 
Thus, when parent normality cannot be assumed, the quotients in the analysis table are 
no longer equal within sampling limits and their ratio is distributed in unknown form ; and 
even if the form were known it would probably depend on parent parameters and hence 
fail to provide an exact test of significance. 

The problem has been considered in four ways :— 

Sampling experiments have been undertaken to see how far moderate deviation 
from normality affects the 2 -distribution ; 

Jfi) Attempts have been made to find transformations of the variate to throw the 
parent distributions into forms with equal variances, at least approximately, 
before the analysis is applied ; 

^j(c) By introducing a randomising process into the data before they are collected, 
attempts have been made to preserve the ^-distribution as a close approximation 
—this amounts to a change in the nature of the inference, as we shall see below ; 

J(d) Tests have been found which can be applied to ranked data irrespective of the 
parent form—this approach is a particular case of (c), but seems to merit special 
mention. 

We proceed to consider these four possibilities. 

23 . 32 . The arithmetic entailed by a single analysis of variance, even in simple cases, 
implies that an extensive sampling inquiry into the distribution of z in non-normal popula¬ 
tions would be a very formidable undertaking. E. S. Pearson (19316) has studied in some 
detail the case of a one-way classification with unequal numbers, when the distribution 
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of z becomes equivalent to that of the correlation ratio r/ 2 . Six populations were chosen, 
characterised by the following values :— 

Pi = 0, p % = 2-50 (symmetrical platykurtio); 

Pi " 0, p 2 = 4-1 (symmetrical leptokurtic) ; 

Pi = 0, Pi = 7-05 (symmetrical leptokurtic); 

Px = 0-2, p 2 == 3*3 (skew, Type III) ; 

Px = 0*49, p 2 = 3*72 (skew, Type III); 

p x = 0*99, p 2 = 3*83 (very skew, Type I, with abrupt start). 

The results suggested that for this range of /?, and /? 2 the distribution of z is adequately 
represented by Fisher’s distribution, and that therefore the homogeneity test may be 
applied. The case when the variation changed from group to group was not considered. 
It was also concluded that “ it seems probable that the more elaborate forms of analysis 
of variance are also of fairly wide application ”. 

Some work by Eden and Yates (1933) is often referred to as experimental confirmation 
of the same kind, but in fact it was carried out with rather a different object, that of con¬ 
firming the 2 -test for data under randomisation (see below, 23 . 36 ). 

Variate Transformations 

23 . 33 . Suppose £ is a new variate £ (#). Then approximately we shall have 

* fdS 

v * rf -(s 

If now the parent variance of the ^-distribution is related in some known manner to the 
mean, say / (m) = v , we have 

As a further approximation, if x varies about m by small quantities we have 

varf = /(«).(23.54) 

Now we wish $ to have a constant variance, say A, and if this is so, 

d$ = IX 

dx \/ / (x)' 

or . (2S - M) 

Although this expression is arrived at by approximation we are entitled to hope that 
the variate £ will have almost constant variance, and at any rate a more stable variance 
than x . 

For instance, if the original variation is thought to be of the Poisson type we have 
/ (x) = x , and from (23;55) are led to consider the transformation 

= y/x, 


2 

var x. .... (23.53) 


. (23.56) 
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if we choose X to be |. Similarly, if the variation is of the binomial type with variance 
p (1 — p) we have 

1 = 1 Vip (V -p)} dp 

= sin 1 y/x, .(23.57) 

on suitable choice* of X. 

23.34. These transformations are designed to stabilise ” the variance. They do 
not necessarily bring the variate closer to normality, though in some cases they will do so 
—we have, for instance, seen that y/%* tends to normality quicker than % 2 (12.7). The 
following values (Bartlett 1936c?) illustrate the way in which the square-root transformation 
stabilises the variance of a Poisson distribution :— 


Moan m. 

Variance of Poisson 

Variance of Poisson 

Variate \/x. j 

Variate Vi x + 4)* 

0*0 

0*000 1 

0-000 

0-5 

0*310 

0-102 

10 

0*402 

0*160 

2 0 

0*390 

0-214 

30 

0*340 

0-232 

4-0 

0*306 

0-240 

60 

i 0-276 

0*245 

90 

| 0 263 

0-247 

120 

0*259 

0-248 

150 

0*256 

0-248 


i 


The term \ in the third column was added by Bartlett on the analogy of a continuity 
correction. For m > 3 the variance is evidently quite stable. 

23.35. If now, having stabilised the variance, we carry out an analysis in the ordinary 
way, our residual sums of squares divided by the appropriate degrees of freedom will con¬ 
tinue to be unbiassed estimates of the common variance v, even if there are differences 
between the means of the classes. Instead of assuming as part of the hypothesis that the 
different classes are distributed with the same variance, we have transformed the variate 
so that this shall be so, at least to a close approximation. Relying further on the result 
that the transformed variates approximate to normality, or that if they do not the differ¬ 
ence will not seriously vitiate the 2 -test, we may apply that test to the transformed data 
in the usual way. 


Example 23.7 (Bartlett, 1936d) 

Table 23.16 shows the number of wheat seeds out of 50 which failed to germinate in 
four repetitions of an experiment with different treatments. 
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TABLE 23.16 


Germination of Wheat Seeds 


Number of 

Number of Treatment. 

Totals. 

Experiment. 

1 

2 

3 

4 

5 

6 

7 

1 

10 

n 

8 

9 

7 

6 

9 

60 

2 

8 

10 

3 

7 

9 

3 

11 

51 

3 

5 

- 

11 

. 2 . 

8 

10 

7 

11 

54 

4 

1 

6 

| 4 j 

i 13 

7 

10 

10 

51 

Totals 

24 

38 

! 

1 17 

! 

i 37 

' i 

33 

26 

41 

216 


In point of fact, treatment 7 was a repetition of treatment 6, the others being different. 
The point of interest is whether the treatments exert any effect on germination. We shall 
not inquire into any differences between experiments (which appear to be negligible from 
the row totals) and shall accordingly consider this as a one-way classification into seven 
classes, four numbers to the class. 

The presumption is that in any given class the variation is of the binomial type. We 
might apply the sin“ 1 y/x transformation, but will adopt instead an ad hoc square-root 
transformation obtained as follows :— 

We have 

v = np (1 — p). 


Suppose now that p — p 0 + d where d is small. Then 


V = n (p a + 6 — pI — 

= n { (1 — 2p 0 ) (p - p„) + Po - 2>o} 
= np (1 — 2p 0 ) + npl. 

If we now put 


£ - V(* + * + *) 


where k — 


npi 

1 — 2 po 


and x is the observed frequency, then £ will tend to have constant 


variance. 

In our example the total frequency is 216 out of 1400 seeds, so that we may take as 
an estimate of p 0 the ratio 216/1400 — 0*16. The transformed variate then becomes 



np + \ + 


60 (-0226) 1 
0-70 J 


= y/(np + 2), approximately. 

x 

i® 
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On this basis the transformed variate-values are— 

TABLE 23.17 


Transformed Variates of Table 23.16 


Number of 
Experiment. 

i 


Number of Treatment. 



Totals. 

2 

3 

4 

5 

6 

7 

1 

3-464 

3*606 

3*162 

3 317 

3*000 

2*828 

3*317 

22*694 

2 

3162 

3*464 

2*236 

3*000 

3*317 

2*236 

3*606 

21*021 

3 

2*646 

3*606 

2*000 

3*162 

3-464 

3*000 

3*606 

21*484 

4 

! 

1*732 

1 

2*828 

2*449 

3*873 

3*000 

. 

3*464 

3*464 


Totals 

11*004 

1 

1 13-504 

i 

9*847 

13*352 

i 

12*781 

11*528 

13*993 

86*009 


The analysis of variance is— 


1 

Sums of Squares. 


d.f. 

Quotient. 

i 

j Between treatments .... 

3*486 

6 

0*581 

I 

4*316 

21 

0*206 

j 

j ‘ " j 

I Totals. 

i 

7*802 

27 



The sum of squares is particularly easy to obtain, being the sum of the original variates 
plus twice the number of variate-values. 

The variance ratio, 2*8, is barely significant, being just beyond the 5-per-cent, point. 
There is little evidence that treatments are exerting any effect on germination, since a 
comparison of treatments 6 and 7 (which are the same) indicates that such “ significance ” 
as exists may be due to heterogeneity in the seed. 

Randomisation 

23.36. (Consider a two-way classification of pq members, the observed value of the 
Jth ^-member of the 4th B-class being x jk . J Following the line already considered in 21.48, 
we will consider the ^-distribution in the population of values obtained by permuting the 
members in any ^4-class in all possible ways. There will thus be (q \) p possible values of 
z, all based on the observed values. We have already considered a case of this kind in 
dealing with the problem of m rankings (16.29) and we shall follow the same procedure 
in solving the more general problem. 

A.S.—VOL. II. p 
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Let the values be arrayed as 


#11 

#11 

#1 8 

#22 

: : : 

x u 

x 2q 

Xpl 

*p2 

. 

X PQ 


. (23.58) 


If S R is the sum of squares between rows, S c that between columns and 8 the total, we 
know that in the ordinary case considered earlier in the chapter, 8 C is distributed as v%* 
with q — 1 d.f., and 8 — S R — 8 C as vx* with (p — 1) (q — 1) d.f. It follows that 


S-S K 


W, say, 


. (23.59) 


is distributed in the Type I form 

dF oc D—i (1 _ flfrp-u dW # . . (23.60) 

It is easier to work with W than with z, but there is of course no difficulty in passing from 
one to the other. 

We proceed to find the first four moments of W in the population of (q !) p values obtained 
by permuting the rows of (23.58) in all possible ways. 


23.37. If in (23.58) we increase the members of any row by a constant a, it is easily 
seen that 8 C and S — S R remain unaffected, and hence so does W. Thus we may take 
the mean of each row to be zero and then S R = 0. With this origin we have 


w = 1 **') .(23.61) 

* 2 * 

If now l '* 

( x v x kj) .(23.62) 

J-i 

and the ^-statistics of the q values x ii9 j = 1 • . • q, are written k il9 k i2 , etc., and 

U = £R ik .(23.63) 

i,k 

we find • 

1 977 

W = - (n ^TyT T .... (23.04) 

i 

E [R{jc) = 0 . . . . . . 

E (Rik) — (?“!) k>e2 

I w . 

e m - t'„ tu + (q - 1) ^~ 2 1 ) ) (? 


. (23.65) 
. (23.66) 

. (23.67) 
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Then, for the moments of U, 

E(U) = 0 .(23.09) 

E (U*) = (q - 1) JT' k i2 k k2 .(23.70) 

ije 

E (U a ) = 6 (3 - 1) JT' k n k >* k n + j g ~ Ilk ~ 2) ^ ^ . . . (23.71) 

i,k, l * i, k 

E mt\ _ 3 (9 “ l) 8 V'm u , (3 ~ 1) (3 “ 2) (q - 3) yi> , , 

2^ k k2 +- Y(q+T) - Z kiikki 

+ 3 (3 - I) 8 {(£' hi hi) 2 - r *£ * 12 } 

+ l2 JS -J) te- 2 ) z> kiS 1c Jc l2 + 72 (q - 1) r **2 ^2 *m 2 . (23.72) 
3 

where i7' denotes summation over values for whioh the subscripts are unequal and permu¬ 
tations are not allowed. 

Finally, for the moments of W we have 

T.1 . TTT\ 1 .An nn. 


E (IF) = 


E (W - IF)* 


4 E' k /2 k k 2 

pW- i) WhJ 2 ' 


Jj] (W _ f|f\3 __ 48 27 tjj &fc2 &/2 , 8 (3 2) Ejci 3 hi 

p 8 (3 - I ) 2 (£*«) a P 3 q (3 - I ) 2 (£*«)* 

E (W — TF1 4 = (^ *«2 ^ 2 ) 2 _ 96 •£' kj 2 hi 

K ’ P* (q- l) 2 (^ i2 ) 4 P 4 (3-l) 2 (3 + 1) (^* i2 ) 4 

. 1152 E kf 2 ^ k/H ‘2 

P*(q- 1) 8 (^ i2 ) 4 

16 (3 — 2) (3 - 3) E' k u k ki 192 (3 - 2) E' k i3 k k3 k rl 
P 4 (3 + 1) (3) (3 ~ l) s (^*i 2 ) 4 P 4 (3 ~ l) 8 3 (^* <2 ) 4 ' 


(23.73) 


(23.74) 


(23.75) 


(23.76) 


These formulae can be derived in the manner of 16.33, but reference may be made to 
Pitman (1938) for further details. 

23.38. We now consider how far the first four moments of IF, as found above, agree 
with the first four moments of the distribution (23.60). The mean and variance of the 
latter are 

- and 2 IE ~_J1 .... (23.77) 

P P 2 (P3~P + 2) ' 

The means agree exactly. For the variances to agree we must have, from (23.74) and 
(23.77), 

4 E h k^ _ 2 (p — 1)_ .„•» 7o\ 

p* (q - 1) (iy‘ p* (pq ~p + 2 )* ' • * v m9 > 

_ 2 X* ki 2 fcj2 

“ "(^■^F. 


. (23.78) 


Writing 


. (23.79) 
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we find that (23.78) is equivalent to 


k = ip ~ v L iq - 1) 

pq — p + 2 


. (23.80) 


The ratio K may have any value from 0 to 


the lower limit being approached when 


one of the second ^-statistics k a is much larger than the others, the upper limit when they 
are all equal. Hence all that can be said about the variance of W is that it is not greater 

2 ip — l) 

than - - t-. and that it takes this value when the variance of each n-class is the same. 

P* (? - 1 ) 

Turning to the third and fourth moments, we note that in many cases where the varia¬ 
tion is not too skew the quantities k i3 and k u will be negligible. A number of terms in 
(23.75) and (23.76) may thus be neglected, but even those that remain are fairly com¬ 
plicated, and it is difficult to say how far the distribution of W will approach the Type I 
distribution (23.60). In practice the values may be worked out and compared. If there 
is reasonable agreement, the 2 -distribution of the variance ratio will hold in the particular 
population which we are considering. 

23.39. A better approach is to find the Type I distribution which has the same first 
two moments as W and to modify the 2 -test where necessary. It may be shown that when 
K is not too small the third and fourth moments of W and the fitted Type I distribution 
are in fairly good agreement, so that we may expect a good fit. 

1 2j K 

The Type I distribution with mean - and variance ~i) ^ as mean an< * var i ance 
of W by definition. Its third moment is easily seen to be 


8 K 2 

p*(q - 1) 


V - 1 H- i 

q - l 


. (23.81) 


We have to see how far this differs from the actual third moment of W given by (23.75). 
Now 

3 r k(2 k/g2 ki2 === 2* kft £ k k2 k[ 2 — 27 k £2 k k 2 

~ 27 k $2 27' kfc 2 k n — (27 k $2 27 kf 2 — 27 k} 2 ) 

- 27 k i2 (327' k i% k k2 - 27 kl 2 ) + £ lc? 2 , 

and hence 

6 27' kft k k 2 ki* 2 _q 0 , 0 27 ki 2 /00 

• • ■ <23 ' 82) 

Since all the jfc’s here concerned are positive, 

Z k (2 Z kf >j ^ {2 A?/,)* 


and hence 

Zk'n 

(Zk {t )* 

Hence,, from (23.82) and (23.83), 
V*' 8 > 3A 


ZJk > J £%* _ n — K)* 

Zk n )* ( K) ' 

1.83), 

> 3 K - 2 + 2 (1 - K)» - K* ^ 1 - 


(23.83) 


. (23.84) 
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Similarly, since 

ih < Me-F _(1 ' *>' «>-*><*-**- m 

it appears that 


0 &fc ‘2 ^ ^ + % 

(£ ki 2 ) 8 4 


. (23.85) 


On comparing (23.75) and (23.81), and assuming that the second term in the former may 
be neglected, we see that they differ by the factor whose limits we have found in (23.84) 
and (23.85), namely 


1 


1 -K 
K 


and 


3 +K 

4 . 


If K is not too small the limits arc not very different from unity, and the third moments 
are accordingly in fairly good agreement. 

In the same way but with rather more complicated algebra it may be shown that the 
fourth moments are in fair agreement. 

When all the rows are rankings, the case reduces to that considered in 16.33 et seq. f 
and we have already seen that the distribution of W is closely approximated by the Type I 
distribution in that case. 


23.40. Suppose, now, that we have p classes of objects, one of each class belonging 
to a second series of classes, q in number. As our hypothesis wo will suppose that member¬ 
ship of the (/-classes is independent of the variate-values, so that we may suppose it to be 
a matter of chance how the values in any p-class are distributed among the (/-classes. On 
this hypothesis the variance ratio will follow the z-form approximately (subject to the 
conditions weliave discussed above) in the population consisting of the (q !) p permutations 
of observed values ; and this will be so whether the parent is normal or not. 

By shaping the inference in this way, and making it conditional, we are thus able to 
apply the z-test even in cases of non-normality. The test of homogeneity still applies, but 
of course the inference is rather different from the usual type. This point has not, perhaps, 
been adequately emphasised in the past and there still seems to be confusion on the subject. 

Randomised Blocks 

23.41. The principle of testing in a conditional population has received its chief 
applications in a certain type of agricultural experiment (and analogous cases in other 
fields), known as a randomised block experiment. We are given p blocks of land and wish 
to test the existence of differential effects among q treatments, e.g. manurial treatments, 
of a crop to be grown on it. We divide each block into q plots and grow the crops on each 
of the pq plots. In any one block we apply a different treatment to each of the q plots ; 
and we allocate the treatments among the plots at random. 

This randomisation is an essential part of the process. If the treatments exert no 
effect the observed yields might have occurred in any order, and by making the inference 
in the proper way we are able to test in the z-distribution without assuming parent nor¬ 
mality or the non-existence of fertility differences between plots of the same block. If, 
of course, the parent is near to normality the test is strengthened. Had we not allocated 
the treatments at random the use of the z-distribution would not have been valid in the 
absence of normality (at least approximate) on the part of the parent. 
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23.42. It is of some importance to make dear the exact hypothesis which is being 
tested in this approach, since misunderstandings on the point have led to some rather 
heated controversy. If the treatments are numbered 1 to q, we consider the possible yield 
on the plot j, & if it received the 1th treatment, say x jk {l) . In actual fact only one of these 
treatments was carried out; the other values of x jk (i) are hypothetical and are based on 
our conception of what would happen if the treatments were differently distributed. The 
totality of values x jk (fl form our hypothetical population. We are supposing that the 
observed yields can be expressed as 

X jk (l) ~ a j + £/*«>’ 

where a s is an effect differing from block to block but constant within blocks, and m is the 
“ individual ” plot effect which has a zero mean. The hypothesis we have considered in 
arriving at the validity of the z-test in conditional inferences is that every treatment affects 
every plot to the same extent, apart from the block effect a } . In short, we suppose that 
(l) is the same for all l. This is the hypothesis usually tested in data from randomised 
blocks. 

Neyman (1935a) proposed an alternative hypothesis, viz. that the mean effects of 
treatments over all blocks were the same, on the ground that we are interested in average 
treatment effects when testing fertilisers, not the effect on particular plots. The hypothesis 
here is that x m , (i) = x_, which is not the same as before ; and it appears from Neyman’s 
analysis that the z-distribution under randomisation may not hold to such a satisfactory 
approximation as in the former case. Once again we have to stress the importance of 
gaining a clear idea of the hypothesis under test. 

Example 23.8 (Eden and Yates, 1933 ; Pitman, 1938) 

Eden and Yates considered some data, based on actual experience of heights of wheat 


shoots, comprising eight 

classes of four, equivalent to the following measurements 



i 


Class 




1 

2 

3 

4 

5 

6 

7 

8 

433 

455 

4874 

4074 

4524 

2571 

4344 

4754 

429 

4194 

389 

5744 

4364 

2634 

5264 

4734 

383 

479 

4634 

4774 

415 

392 

470 

4234 

437 

504J 

4694 

4524 

418 

426 

532 

4814 

The variances of the eight classes, 

in units of ^th, 

are then found to be 


7628; 15,702; 22,669; 59,732; 3,666; 90,593 ; 26,297; 8672. 

The quantity K of equation (23.79) is then found to be 0-7577. The quantity 

2 ^ * 8 Thus (23.80) is approximately satisfied and we expect that the 

z-distribution will be approximately reproduced by the data under random permutations. 

This was confirmed by Eden and Yates in a sampling experiment on the data. 1000 
sets of permutations were taken and z calculated for each. Agreement with expectation 
was good. 

Example 23.9 (Friedman, 1937) 

A good example of data from populations which are probably far from normal is given 
in Table 23.18, showing the standard deviations of expenditures on various items for six 
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income-groups. The figures relate to families of wage-earners and lower salaried workers 
in Minneapolis and St. Paul, U.S.A., in 1935-6. 


TABLE 23.18 

Standard Deviations of Expenditure on Certain Items of Families in Specified Income Groups . 

(Figures in brackets aro ranks.) 


Annual Family Income (dollars). 


Category of Expenditure, j 

i 750- : 1000- 

I 


! 


Housing .... 

. ! 100-3 

(5) 

08 4 

(i) 1 

Household operation 

. I 42-2 

(1) 

44-3 

(3) ! 

Food. 

71-3 

(1) 

81-9 

(2) ! 

Clothing .... 

. i 37-6 

(1) 

000 

(3) 

Furnishings, etc. 

58-3 

(2) 

52-7 

(1) ! 

Transportation 

. ! 46-3 

(1) 

82-2 

(2) 

Recreation 

. j 190 

(1) 

23-1 

(2) ; 

Personal care 

8-3 

(1) 

8-4 

(2) 1 

Medical care . 

20-1 

(1) 

33-5 

(2) ; 

Education 

. : 3-2 

(1) 

41 

(2) ; 

Community welfare . 

. ! 41 

(1) 

18-9 

(5) ! 

Vocation .... 

• i 7'7 

(1) 

11 2 

(5) i 

Gifts. 

. ; 53 

(1) 

10-9 

(2) 

Other. 

. i 6-0 

(5) 

5-0 

(4) 


i 


1250- 

1500- 

1750- 

2000- 

2250-2500 

89-5 

(3) 

77-9 

(2) 

100-0 

(4) 

108-2 

(6) 

184-9 

(7) 

00-9 

(4) 

73-9 

(6) 

43-9 

(2) 

01-7 

(«) 

102-3 

(7) 

100*7 

(7) 

80-5 

(3) 

100*3 

(5) 

90-7 

(4) 

100-6 

(») 

57-0 

(2) 

00-8 

(4) 

71*8 

(5) 

83-0 

(6) 

117-1 

(7) 

90-0 

(6) 

00-4 

(3) 

104-3 

(7) 

89-8 

(5) 

85*8 

(4) 

129-8 

(3) 

181-0 

(8) 

172-3 

(5) 

104-8 

(4) 

240-8 

(7) 

38-7 

(3) 

45-8 

(4) 

59-0 

(7) 

50-7 

(6) 

55*2 

(8) 

9-2 

(3) 

14-3 

(6) 

10-0 

(4) 

15-8 

(7) 

12-5 

(5) 

00-1 

(4) 

09-3 

(5) 

114-3 

(7) 

45-3 

(3) 

101-6 

(6) 

12-7 

(4) 

18-9 

(5) 

8-9 

(3) 

41-5 

(«) 

66-3 

(7) 

8-5 

(2) 

12-9 

(3) 

25-3 

(7) 

19-9 

(6) 

10-8 

(4) 

10-4 

(2) 

10-9 

(4) 

10-5 

(3) 

14*0 

(«) 

14-4 

(7) 

11-2 

(3) 

25 3 

(4) 

42-3 

(6) 

48-8 

(6) 

69-4 

(7) 

22-2 

(7) 

! 2-5 

(2) 

0-2 

<«) 

1-0 

(1) 

4-0 

(3) 


In brackets we show the ranks of the figure for different income-groups for each 
category of expenditure. We wish to know whether the standard deviations for each 
category differ significantly for the different income levels. On the hypothesis that they 
do not it is a matter of chance how the ranks fall. 

The sums of ranks in each column are :— 

23, 36, 53, 57, 70, 70, 83. 

12/S 

The coefficient of concordance (vol. I, p. 411) is then W = 9 where m = 14, 

1 m 2 (n 3 —* n) 

n = 7 and S is the sum of squares of deviations of sums of ranks from the mean 

m ( \ + = 56 ; we find that S = 2620 and W = 0*4774. We may test the significance 

2 

(vol. I, p. 419) by writing 

, , (m — 1) W , „. 

* = ilog> -j =1-24 

r, = (» - 1) -1 - 5$ 
m 

v 2 = (m — 1) v x = 76|. 

The value of z is highly significant, and we conclude that standard deviation is related to 
size of income—the more money there is to spend, the more variable is the expenditure 
on particular items. 
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NOTES AND REFERENCES 


The idea of comparing variance between classes with the variance within classes in 
order to test homogeneity is found as early as Lexis (see footnote on page 119). Modern 
developments, and particularly the exact test of significance for normal parents, are due 
mainly to R. A. Fisher. Apart from papers by Irwin (1931 and 1934), connected accounts 
of the theory of variance analysis are hard to find, many points of theoretical interest being 
scattered among papers which are primarily practical. 

For the general theory and applications reference may be made to Fisher’s Statistical 
Methods (1925a, 1944) and Design of Experiments (1935c, 1942), to a useful introductory 
account by Goulden (1939), and to the writings of Yates, particularly his Design and Analysis 
of Factorial Experiments (19376). 

On the question of randomisation in preserving the 2 -distribution see Eden and Yates 
(1933), Welch (1937, 1938a), and Pitman (1938). References to work on ranking are given 
at the end of Chapter 16. 

For work on the distribution of the greatest of a set of variances see Fisher (1929a, 
1940a), Cochran (1941), Stevens (1939a), Hartley (1938), and Finney (1941a). For further 
work on the square-root and sin” 1 transformations see Cochran (19406), Beall (1942) and 
Curtiss (1943). 

The literature of this subject is now very large. Some further references are given 
at the end of the next chapter. 

EXERCISES 

23 . 1 . If xj (j = 1 . . . n) are a set of normal independent variates with variances 


l/w .consider the transformation 

n 

n k “ hj x j V w j> 

i-1 

where the Z’s are defined by 
hk = y/(w k /Ew) 

k = 1 . . . n 

(Jr) (£“'■)} 

1 

£ 

rt <N 

1 ! II 


j — 2, 3, ... n 
lc =j 

Ijk ~ 0 * 

j - 2, 3, ...» - 1 
k — j + 1, ... n 


Show that the V s are orthogonal and hence that 

*S = £ w k*l 

k—1 Ar— 1 

n 

is distributed as % 2 with n degrees of freedom. Noting that u x = £w k x k /y/Lw is dis- 

k-l 

tributed normally with unit variance independently of u 2 . . . u n , show that 

u>k ( x k - x ) 2 

is distributed as % 2 with n — 1 degrees of freedom. 
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Hence derive the 2 -test for the analysis of variance with unequal members in a one-way 
classification. 

(Irwin, 1942.) 

23 . 2 . Verify the arithmetic in the analysis of variance of Example 23.5. 

23 . 3 . Verify the arithmetic in the analysis of variance of Example 23.6. 

23 . 4 . In a bivariate table with k rows (different rows corresponding to different 

values of the ^-variate) write 

h = -h z n x (Vx - y) a 


q = ■ , 2’ (n x 4), 

<J“ x 

where a 2 is the variance of the y variate, s'* the variance, and n x the frequency in the row 
with variate-value x. Thus 

_ yU ... ^ * 

1 ~ ^Ivx 9 

and the ratio on the right is the variance-ratio in a one-way classification with unequal 
numbers. 

Show that, for any form of population, 

E (h) - k - 1 E (q) N - k 

var A — 2 (k — \) (ft, - 3) [z — + 1 


var q = 2 (N - k) + (fit - 3) [z— + N - 2*1 

l* n x J 

cov (A, q) = (/? a - 3) [k - 1 + ~ 


Hence, approximately, that 


F ( h \ =_ K (/i) Jl -1- _ var ? C0V (A ’ q) \ 

\q) ~E(q) l E*(q) E(h)E(q)] 

„ fA\ 2 E‘ l (h) f. var h 4 cov ( h, q) 3 var q 


V i - \ — - _' 7 Jl 4- — _ —\ mw > 1 / i — I 

\q) E* (q) 1 E* (A) E (A) E (q) T E* \q) 
In the case when all rows contain the same frequency 


and then 


*(=)-£ ■ 


v&r (h\ = 2(k ~l)^N 

\q) 0 - kj* • 

Hence show that the mean, and variance of the variance-ratio are, to this order, independent 
of the distribution of y, indicating that the 2 -test is not very sensitive to deviations from 
normality. 

(E. S. Pearson, 19316. It is rather remarkable that the correlation of h and q, far from 
disturbing tho z -distribution, contributes to its stability.) 
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THE ANALYSIS OF VARIANCE—(2) 

Estimation of Class-differences 

24.1. In the previous chapter we considered the analysis of variance mainly as the 
provider of tests of homogeneity. We have now to examine in more detail the problem of 
estimating class-effects, assuming that the homogeneity tests have shown them to exist. 
We discuss in the first instance the case in which there is only one member in each sub¬ 
class, and for the sake of simplicity confine ourselves to a two-way classification, though 
the theory is quite general. 

The fundamental hypothesis to be examined is that the data may be expressed in 
the form 

x jk == dj + b k + Cjk> • • . • • ( 24 - 1 ) 

where aj and b h represent class-effects and £ is a random normal variate with zero mean. 
Our analysis of variance will have shown whether this is an acceptable hypothesis, and 
our present problem is to estimate the unknown values of a’s and 6’s from the observed x's. 

24.2. The joint probability of the £’s is 

dF x iiPV exp { “ ^ £ (*»* ~ a i - • • • d tpv • • ( 24 - 2 ) 

where v is the variance of £, and in conformity with the notation used in the previous chapter 
we have p .A-classes and q classes. The maximum likelihood estimates of the a’s and 
6*s are then those which minimise the sum in curly brackets in (24.2), that is to say, the 
least-squares solution of the equations (24.1). In the usual way we find 



2^( x jk 

~ a i 

- b k ) = 0, 

j = 1. • • • V 



k—l 



► . 

. (24.3) 


i>* 

-a, 

o 

1! 

1 

k - 1- • • • aj 


which reduce to 









1 1 

i 1 

O' O* 
' 

:S} . 

. (24.4) 

Summing the first equation over j, 

dividing by p, 

and subtracting from the first, 

we obtain 


x j. - 

- X" 

ZZZ (lj — Cl' 

j — i,... p . 

. (24.5) 

and similarly 







x .k - 

- x um 

II 

=?• 

1 

k — . q. 

. (24.6) 


In (24.5) there are p equations, but if we sum them all we reach the identity 0 = 0, so that 
only p — 1 are independent. There is thus an element of indeterminacy which we may 
remove by supposing that a # = 0. Similarly we may take b m = 0, and then we have 

i-l,.-P • • • • (24-7) 

6* = x .k #.. fc = 1, . . . q. .* . . . (24.8) 
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(Our estimate of any class-effect is equal to the deviation of the mean in that class from 
the total mean^ 

24.3. Evidently similar equations arise in the general n-way classification. We shall 
see below that they break down for unequal numbers in subclasses, except in a special 
case when the numbers are proportionate. 

The assumption that dj and b k have zero means is not, in effect, a restriction on gener¬ 
ality but only a convention. \lf we prefer it, we may consider the slightly more general 
hypothesis that £ has a mean m, in which case we have to minidisk 


( x jk — — b k — m) 2 . . . . . . (24.9) 

This will be found to lead back to equations (24.7) and (24.8), with the additional equation 
for estimating m 


m = x mt 

Or again, if we prefer to absorb m into the a-effects we have 


j 

bk 


x i. I 

x .k - *.. J 


(24.10) 


. (24.11) 


the mean of aj in this case not vanishing. Which form we use is a matter of convenience. 


24.4. It is important to notice that the equations of estimation which we have just 
reached give each dj and b k independently of values in other classes. We obtain the same 
equation for aj whether we happen to be estimating other a 9 s and ft’s or not. This property, 
as we shall see shortly, fails to hold if the numbers in subclasses are disproportionate. 
The situation is similar to that in which we can determine the constants in a regression 
line independently of the others if orthogonal polynomials are used, in that each constant 
is given by a separate equation not containing any of the others. Data of this kind are 
called orthogonal . 

The direct comparison of class-means which is possible with orthogonal data can be 
seen, from general considerations, to be legitimate. In comparing x u — with x i% — . 

the estimates of the effects in the ith and jth A-classes, we are in each case averaging over 
q B -classes with one member in each. The B-classes, therefore, affect each mean to the 
same extent and do not affect their difference. If there are more members in some sub¬ 
classes than in others, the means are unequally weighted with different B-effects and 
the comparison is invalidated. 


24.5. Regarding x i% — x tt as the estimate of a f and x %k — x ## as the estimate of 6 fc , 
we see that the familiar equation 


^ (%• - *..)■ = £ - a..) 2 + £ {z.k - *..)* + £ - *i. ~ * * + * .)* (24.12) 

can be regarded as an analysis of the sum of squares on the left, which has pq — 1 degrees 
of freedom, into terms in which there is one degree of freedom for every fitted constant and 
a residual with (p — 1) (q — 1) degrees of freedom. Every constant fitted reduces the 
number of degrees of freedom in the residual by unity. 
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Unequal Numbers in Subclasses 

24.6. For a one-way classification we have already considered (23.7 and 23.8) the 
case where the numbers in subclasses are unequal. It was seen that the total sum of squares 
could be expressed as a sum between classes and A residual which were independently 
distributed and whose ratio therefore provided a homogeneity test in the usual way. 

When we try to extend this result to two-way or generally to w-way classifications, 
we begin to run into difficulties. We can still find, as shown below, an estimator of v based 
on p — 1 degrees of freedom and differences between A-classes, and one with q — 1 d.f. 
based on differences between 27-classes ; but these are no longer independent, and conse¬ 
quently we cannot subtract their sum from the total sum of squares in order to obtain 
a residual or an interaction term which also provides an unbiassed estimator. 

On the other hand, there is now available an independent estimator of v which did 
not appear in the orthogonal case where only one member was included in each subclass. 
In fact, since there are several members in any given subclass, we can find an estimator 
of v based on those members alone ; and we may pool all such to form an estimator with 
N — pq degrees of freedom, where there are pq subclasses. This estimator will be inde¬ 
pendent of subclass means and any estimators based on them, and hence provides 
a “ residual ” such as we require to carry out homogeneity tests. 

24.7. Suppose we have a two-way classification into p A-classes and q 27-classes, and 


let the number of members in the subclass A.- B k be 

n )k- 

Let 

x jk be the mean of these 

members. We may array the 

means as 


X lx 

£ 12 ... 

*iq 



*21 

... 

*28 

> 

. (24.13) 

*P1 

x p2 ... 

*P«. 




Now we may, in the first instance, test for homogeneity by ignoring the differences 
between A- and ^-classification and merely regarding the data as a one-way classification 
with pq classes. The usual test for homogeneity is then applicable. The sum of squares 
between means of classes will have pq — 1 degrees of freedom, the total N — 1 d.f., and 
the residual N — 1 — {pq — 1) = N — pq d.f. This residual, in fact, is the one men¬ 
tioned in the previous section, and is based on the pooled sums of squares within the pq 
classes. The other term based on pq — 1 degrees of freedom is the sum 

Zn jk (x jk -X")* 

and is derivable from the array (24.13). 

24.8. To test the effect of A-classification separately we proceed as follows :— 

Any x jk is the mean of n ik values and, on the usual hypothesis as to normality, will 

have variance —. If x is the mean of all N values we have 
n jk 

= ^f£ nik * ]k - • 


. (24.14) 
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Let the marginal unweighted means in (24.13) be x j% , x %k> so that 


where 


Xj. =-E x jk 

r_ >• 

X'k - £ Xjk 

V J 

On the hypothesis of homogeneity the variance of Xj t is given by 

«* ?(»»)' 


(24.15) 


(24.16) 


(24.17) 


Now let us regard the means x^ as the means in p classes whose numbers are N ]t as 
is legitimate from (24.16). Then writing 


£Nj ij. 

E Nj 


. (24.18) 


we have for an unbiassed estimator of v 


E Nj (Xj - c) 2 = —L, . J E (Nj xi )-c*En\. . . (24.19) 

P - 1 i V ~ 1 I i i J 


This estimator has p — l degrees of freedom and is distributed as £ 2 . (This follows from 
the one-way case except that Nj may not be integral; and its general truth may be estab¬ 
lished as in Exercise 23.1.) It is independent of the residual with N — pq d.f., and hence 
the -A-effects may be tested separately. 

Similarly, if 


M k p* j \n Jkt y 


. (24.20) 


an unbiassed estimator of v is given by 


where 


. \E(M h * mk )-d'EM\, 
q-i{k k J 


d = 


S M k x k 

k 


S M k 5 

k 


. (24.21) 


. (24.22) 


and this also may be compared with the independent estimator based on N — pq d.f. 


Example 24.1 (data from Brandt (1933) considered by Yates (1934a)) 

Table 24.1 shows, for a number of breeds of pig, the numbers of each breed, 
divided into male and female, and the total logarithm of the percentage bacon yielded by 
the slaughtered carcases. The logarithm has been taken so as to normalise the variate. 
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TABLE 24.1 

Numbers and Logarithm of Percentage Bacon in Breeds of Pigs. 


Breed. 

Female. 

Male. 

Number. 

Log. Percent. 
Bacon. 

Number. 

* 

Log. Percent. 
Bacon. 

Hampshire 

33 

66-55 

89 

18104 

Duroc Jersey 

51 

98-69 

141 

281-43 

Tamworth . 

13 

25-90 

17 

34-20 

Yorkshire 

4 

7-62 

9 

17-58 

Berkshire 

8 

14-64 

4 

8-20 

Poland China . 

15 

28-11 

32 

64-42 

Chester White . 

35 

66-90 

47 

90-52 

Others .... 

12 

23-32 

23 

46-70 

Totals . , 

171 

331-73 

362 

724-09 


The total sum of squares, which is not obtainable from this table as it stands, we quote 
as 130142. 

The class-means and reciprocals of class-frequencies are given in Table 24.2. 


TABLE 24.2 

Class-Means and Reciprocals of Class-Frequencies for the Data of Table 24.1. 


Breed. 

Female. 

Male. 

/ 

Unweighted 
Mean of 
Means. 

Mean. 

l/njk 

Mean. 

1 /n)k 

Hampshire . 

Duroc Jersey . 
Tamworth .... 
Yorkshire .... 
Berkshire .... 
Poland China . . 

Chester White . 

Others. 

2-016,667 

1-935,099 

1*992,307 

1-905,000 

1-830,000 

1-874,000 

1-911,429 

1-943,333 

0-030,30 

0-019,61 

0-076,92 

0-250,00 

0-125,00 

0-066,67 

0-028,57 

0-083,33 

2-034,158 

1- 995,958 

2- 011,765 

1- 953,333 

2- 050,000 
2-013,125 

1- 925,958 

2- 030,434 

0-011,24 

0-007,00 

0-058,82 

0-111,11 

0-250,00 

0-031,25 

0-021,28 

0-043,48 

2-025,412 

1- 965,528 

2- 002,036 
1-929,167 
1-940,000 
1-943,562 
1-918,694 
1-986,884 

Unweighted Mean of 
Moons. 

1-025,979 

(Total) 

0-680,40 

2-001,841 

(Total) 

0-534,27 

1-963,910 
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Taking first the classification into male and female {q — 8), we find, from the relations 

1 _ 1 r 1 

TT-r ^ - 


N. 


i r k n ik 


Ni = 


64 


0-680,40 


- = 94-0623 




64 


0-634,27 


= 119-7896. 


Then, from (24.18) 


„ _EN i x u _ (94-0623 X 1-925,979) + (119-7896 x 2-001,841) 

c —-—- 




94-0623 + 119-7896 


= 1-968,474. 


Thus our estimate of v, with one degree of freedom 

= E (Njxj') - 
= 0-3032. ' 


Similarly for the eight breed-classes we find an estimate of v with seven degrees of 

, . 0-6056 

freedom to be —-— = 0-0865. 

7 

Considering the 16 subclasses as a one-way classification, we find the following 
preliminary analysis (the arithmetical details of which we omit):— 


TABLE 24.3 

Analysis of Variance of Data in Table 24.1. 


Sum of Squares. 

d.f. 

Quotient. 

Betweea classes .... 

1-2715 

15 

0-0848 

Residual. 

11-7427 

517 

0-0227 

Totals . 

13-0142 

532 



The variance ratio here gives a value of z equal to 0-659, which is significant. Thus the 
data are not homogeneous. 

We now require to decide whether the departure from homogeneity is due to either 
breed or sex or to a combination of the two. For sex-differences we have found an estimate 
of v equal to 0-3032 with one d.f. Comparing this with the independent residual from 
Table 24.3 of 0-0227 with 517 d.f., we find that the effect of sex is significant. Similarly, 
for breed, the estimate of v is 0-0865 for 7 d.f., which again is significant. We conclude 
that both breed and sex influence the departure from homogeneity. 






224 


THE ANALYSIS OF VARIANCE 


It is particularly important to note that since the estimates between breeds and between 
sex are dependent, we cannot analyse the variance as follows:— 

TABLE 24.4 

Incorrect Form of Analysis of Variance of Data of Table 24.1. 


Sum of Squares. 

d.f. 

Quotient, j 

i 

Between sexes. 

0-3032 

1 

. i 

0-3032 j 

Between breeds. 

0*6056 

7 

0-0865 | 

“ Interaction ”. 

0-3627 

7 

0-0518 j 

Residual. 

11-7427 

617 

0-0227 i 

i 

Totals . 

13-0142 

632 



In fact the term shown as “ interaction ”, calculated so as to make the sums of squares 
and degrees of freedom additive in the usual way, is not an unbiassed estimate of v. This 
is a critical point of difference between the orthogonal and the non-orthogonal case. 


24.9. Suppose that the homogeneity test has shown the existence of significant 
class-effects. As before, we turn to consider the hypothesis that the data can be expressed 
as the sum of A - and JS-effects separately with a random normal residual. Let x jJd be 
the typical member of the (j, k )th subclass, l varying from 1 to n jk . Our hypothesis is then 

x m — a j + b k + tjka ..... (24.23) 
where £ is normal with variance v. For convenience we will regard the mean of £ as absorbed 
in the coefficients a, so that we may take £ to have zero mean. 

The usual process of estimation of the a ’s and 6’s leads to the minimisation of the 
sum over all N values of 


£ (H 


fW 


a* 


KY 


. (24.24) 


Differentiating with respect to and b k , we find the series of equations 
E E' (ZjM — aj — b k ) = 0, j = l ... p 

k 

E E {Zjjfl — aj — b k ) == 0, k ^ 1 . . . < 

i 

where E ' denotes summation over the n ik values in a subclass. These equations reduce to 

. (24.25) 


E n jk aj + E n ik b k — E Uj k x jk 

k k k 


I n )k a f + E n jk b k = £ n jk x jk 
i i i 

Writing for S n jk and N %k for Z n jk , we have 

k j 

Mf. a j + % n jk b k =» Z n jk x jk j — 1, 

k k 

Zn Jk aj + N. k b k =Z n Jk x jk k = 1, 

To which we may add 


V 

?• 


. (24.26) 
. (24.27) 


E b k — 0. 


. (24.28) 
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Had we chosen to absorb the mean of £ into the 6’s, this last equation would be replaced 

by 2 a, — 0 . 

When all the n’s are equal these equations reduce to the orthogonal case, and each 
a- or 6-coefficient can be independently estimated. In the contrary case the equations 
have to be solved as they stand. 


Example 24.2 

Returning to the data of Table 24.1, we find for equations (24.26) and (24.27) the 
following, the values of the constants required being obtainable from the body or marginal 
sums of the table itself:— 


171a! + 336! + 516* + 136., + 46 4 + 86 5 + 156, + 356 7 + 126 8 

362a a + 896x + 1416 a + 176 3 + 96, + 46 5 + 326, + 476 7 -f 236, 
33a! + 89a a + 1226i 
51a, + I4la 2 + 1926 2 

13a! + 17a 2 { 306, 


4a, + 9a, 

8a, + 4a 2 

15a, + 32a a 
3/5a, ■{“ 47a a 
12a! -f 23a* 


-4- 136 4 


126 ;, 

+ 476 6 

-j- 8267 

+ 356 , 


= 331-73 
= 72409 
= 247-59 
= 380-12 
= 60-10 
« 25-20 
= 22-84 
= 92-53 
= 157-42 
= 70-02 


To which we may add a, + a, = 0. 

The solutions are 

- a x = a 2 = 0-026,507 ; 

6, 2-017,259 ; 6 2 = 1-967,367 ; 6 3 - 1-999,799 ; 6 4 = 1-928,267 ; 

6 5 r= 1-912,169 ; 6, = 1-959,136 ; 6 ? -= 1-915,877 ; 6, = 1-992,241. 

These give us the “ best ” estimates of the mean effects of sex and breed on the 
hypothesis expressed by (24.23). 

The mean of the 6’s is 1-961,514 which may be taken as an estimate of the mean of £, 
the 6-effects then being the differences of the above 6-values from this mean. 


24.10. Let us now consider the analysis of variance in the non-ortliogonal case, 
when constants have been fitted by least squares in the above-mentioned way. 

To make the discussion clearer we will regard the estimation as relating to p constants 
aj, related by £ (aj) — 0, q constants b ki related by £ (b k ) 0, and the mean m. There 
are thus p 4- q — 1 independent constants which, in effect, provide estimates of the means 
of subclasses. Whatever these means really are, the residual quotient based on N — pq 
degrees of freedom gives an unbiassed estimator of i\ the common variance. We have 
now to analyse the remaining sum of squares based on pq — 1 d.f. 

If the true (population) values of the constants are denoted by <x.j, and //, the sum 

z (Xjki — /')* 

is distributed as v% 2 with N degrees of freedom. Developing yet another variation on 
a familiar theme, we show that the corresponding quantity 

2 tew — a s — b k — m) 2 = £ ( x m - x y - P k - ft) 2 - £ (a y - x y ) 2 

~EV>* ~ PkV - *(« - l 1 ) 2 - (24.29) 

is distributed as v% 2 with N — (p + q — 1) d.f. 
a.s.—VOL. 11. 


Q 
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In fact , equations (24.26) and (24.27) show that the estimators a, b (and in our present 
case m also) are linear in the variables x. We can then find p + q — 1 orthogonal normal 
variables in terms of which they can be expressed. Their sum of squares will be distributed 
as v% 2 with p + q — 1 degrees of freedom (not some multiple of x 2 because the mean value 
must be p + q — 1 in virtue of 18.17). Thus the remaining term Z (x ild — — b k — m) 2 

is distributed as v% 2 with N — (p + q — 1) degrees of freedom, independently of the portion 
due to the constants a , b and m. 

Furthermore, the actual reduction in sums of squares, equivalent to the sum of the 
last three terms in (24.29), may be easily determined. Precisely as in the similar problem 
of evaluating residuals in a regression equation, we have 

Z (x i1d — a,* — b k — m) 2 = Z x) kl — Z a* Z x m — Zb k Z x jkl — mZ x m . (24.30) 

jk,l kj t l jkl 

where, of course, summation takes place over all values. 

24.11. The total sum of squares is already calculated about the estimated mean 
m, so that the reduction for the term Z m 2 — N has already been taken into account. 
The total sum is then distributed as v% 2 with N — 1 d.f., as we already know. We know 
further that we can split off the independent residual sum based on N — pq degrees of 
freedom. This leaves us with a sum based on pq — 1 d.f. From the previous section it 
follows that we can analyse this sum into two parts : (a) the sum of squares due to fitting 
the constants aj and b k , accounting for p + q — 2 d.f., and (6) the remainder based on 
pq — 1 — (p + q — 2) = (p — 1) (q — 1) d.f. This remainder is independent of the sum 
of squares due to fitting constants and provides an unbiassed estimator of v. If the ratio, 
as compared with the residual based on N — pq d.f., is significant, the hypothesis of additive 
effects breaks down. In short, we may regard this quantity as an interaction term. 

24.12. One important point to notice in this connection is that the interaction term 
depends on whether p + q — 2 or fewer constants are fitted. In the orthogonal case we 
can determine an interaction term once and for all, however things stand in regard to the 
estimation of inter-class effects ; but for non-orthogonal data the number of class-effects 
estimated affects the interaction term, and if necessary a new significance test has to be 
applied if further estimates are calculated. The situation is similar to the testing of 
regression coefficients when orthogonal polynomials are not employed. 

Example 24.3 

Returning again to the data discussed in Examples 24.1 and 24.2, let us regard the 
means in all 16 subclasses as simultaneously under estimate. For the reduction in sum 
of squares due to the constants we find, using the values of a and 6 found in Example 24.2,— 

0-026,607 (- 331-73 + 724-09) + (2-017,269 X 247-69) + (1-967,367 X 380-12) . . . 

_ !>«• _ 1-04146. 

533 

Here, for instance, the sum Za\ is given by multiplying by the term £x lk already 
found. The last term removes the effect of including the mean among the 6’s. 
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The sum of squares between classes was found in Example 24.1 to be 1-2715, based 
on 15 d.f. We then have 


Sum of Squares. 

d!f. 

Quotient. 

Sex and breed (estimation of constants) 

10415 

8 

0*1302 

Interaction. 

0*2300 

7 

0*0329 

1 

Between classes. 

1*2715 

15 



Comparing the interaction term 0*0329 (7 d.f.) with the residual 0*0229 (517 d.f.) we see 
that it is not significant. 

If we neglect sex and consider breed alone, we have only to estimate eight constants 
. . . b 9 subject to S (6) = 0. The sum of squares for breed alone is given by 

^2 (247-59)* + ^ (380-12)* + . . . - ~ (1065-82)* = 0-7253. 

Similarly the sum of squares for sex alone will be found to be 0*4224. We have the 
following analysis :— 


TABLE 24.5 

Further Analysis of Variance of Data of Table 24.1. 


Sum of Squares. 


d.f. 

Quotient, j 

i 

Test for Sex 



i 

i 

j 

Between breed (estimation of constants) 

0*7253 

7 

- i 

Sox. 

0*3162 

1 

0*3162 | 

Sex and breed. 

1*0415 

8 

— ; 

Test for Breed 

: 



Between sex (estimation of constants) . 

0*4224 

1 

i — 

Breed. 

0*6191 

7 

0 0884 

Sex and breed. 

1*0415 

8 

i - 

Interaction. 

0*2300 

7 

i 0*0329 

Between classes. 

1*2715 

15 



Here, for instance, if we test for sex there are seven independent constants for breed 
and one for sex, the latter being the only one that interests us; and similarly for breed. 
On comparison with the residual 0*0227 both sex and breed are found to be significant. 

24.13. The reader may perhaps find the various tests of Examples 24.1 and 24.3 
confusing, and we accordingly summarise our results for the case of unequal numbers in 
subclasses. 

In every case, except where each subclass contains not more than one member, an 
estimate of the common variance v may be obtained, with N — pq d.f., by pooling the 
sums of squares within the pq subclasses. Call this v x . 
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Homogeneity may then be tested (a) by considering the pq classes as a single one-way 
classification and comparing the quotient between means with v u or (6) by calculating 
for either classification separately the estimates based on (24.19) and comparing them with v t . 

If homogeneity is rejected in favour of the additive effect of classes expressed by the 
usual hypothesis, the sum of squares between all classes based on pq — 1 d.f. may be split 
into independent sums related to the fitting df the constants and to an interaction term. 
The latter can be compared with v x to test for interaction. If this is not significant, alter¬ 
native tests for effects between A- and between R-classes may be derived by testing the 
sum of squares attributable to the fitting of the respective constants against v x . These 
tests are, in effect, tests of one class neglecting the effect of the other, and may not be 
accurate if the latter effect is not negligible. It is probably better to fit constants to both 
classes simultaneously in the first instance. 


Proportionate Frequencies 

24.14. We have previously spoken of non-orthogonal data as meaning any classi¬ 
fication with unequal frequencies in the subclasses, but there is one other case of unequal 
frequencies for which orthogonality exists, namely the one in which frequencies are pro¬ 
portionate, i.e. there are marginal frequencies l^ m k , such that 

n ik ^l^m k . ..... (24.31) 

Here the means of A-classes are estimates of the individual corresponding a’s (though it 
must not be overlooked that they are based on different numbers of members in margins), 
and the sum of squares between A-means may be computed in the usual manner appro¬ 
priate to a one-way classification with unequal numbers. Similarly for B. The interactions 
may be estimated by subtracting the A- and J?-sums from the sum of squares between 
classes. We leave it to the reader to verify these statements. 


Special case of 2 X 2 .. . Classification 

24.15. The foregoing analysis can be extended to the n- way classification, but in 
the general case the solution of the equations becomes rather complex and the arithmetic 
a considerable nuisance. Where, however, the classifications are simple dichotomies the 
problem simplifies to a great extent. For instance, in equations (24.27), if there are only 
two values of ap which we may take to be + a and — a, we have 

N.k bjc = £ n jk x ik — n lk a + n 2k a . 

We have selected the a’s so that E (a) = 0, which implies that the mean m is amalgamated 
with the 6’s. Substituting for the 6’s in (24.26), we find 

=£n jk x jk -Z£- k Z ni x jk 

l k & .k ) k k & .k 

which reduces to 


/ f n u n i a [ _ n tl n tt 
\ »«+»!• »m+»m 



^11 W'H 
»ll+»u 


(*ii —*n) H *i») 4" 


(24.32) 


Thus a is the weighted mean of the differences of corresponding i?-class means and may 
be determined direct. So generally for a 2 x 2 x 2 . . . classification. The differences 
may be tested for homogeneity by the *-test, which in this case reduces to the f-test. 


24.16. In view of the relative complexity of the non-orthogonal case, it is natural 
to wonder whether any serious error would be committed if we regarded the p x q table 
of array means as an ordinary two-way table with one member in each class and analysed 
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the variance accordingly. Evidently such a procedure sacrifices a lot of information about 
variation in subclasses, but that is not the point. Is the analysis valid ? 

The hypothesis on which the analysis is based is equality of variance in subclasses. 
If the numbers in subclasses are very unequal the means based on them will have very 
unequal variances, and we expect that the analysis may be misleading. If, however, the 
numbers are close to equality the analysis will probably be approximately correct. 

Example 24.4 

Reverting once again to the data considered in earlier examples, we have the following 
analysis for the variance of the 2x8 table of class-means :— 


Sum of Squares. 


d.f. 

Quotient. 

Between sex.j 

0-3032 

1 

0-3032 

Between breed.] 

0-2635 

7 

0-0376 

Residual. ; 

0-2387 

7 

0-0341 

Totals.I 

0-8054 

15 



The sum of squares between sex is the same as before, as it must be for a dichotomy, 
but the effect of breed is seriously underestimated and would not be judged significant by 
comparison with the interaction term, which is our residual. The numbers in the breed- 
classes are, in fact, too different to justify the approximation. 

The Missing Plot Technique 

24.17. The simplicity of the analysis of variance in the orthogonal case and the 
economy imported by keeping the number of values as low as possible often leads to the 
carrying out of experiments with only one member in each subclass. But this has a certain 
practical danger in that the value in a subclass may be lost through circumstances beyond 
the experimenter’s control. For instance, an animal may die in the course of an experiment, 
or a crop on a particular plot may be ruined by pest; or sometimes a record may actually 
be lost after measurements have been carried out. Tn such cases we may estimate the 
missing values and perform a variance-analysis in the following way. 

24.18. Consider in the first place a p X q classification with certain missing values, 
r in number. We assume as usual that the variate-values are expressible in the form 

x jk — a i + + ni 9 . . . . . (24.33) 

and we know that the “ best ” estimators of the constants are 
J m a=i x mm "J 

a i = x i. “ *.. \ .(24.34) 

The quantities on the right are, however, unknown to us because of the missing values. 
Suppose that we estimate the constants by minimising 

E' (x jk — aj — b k — m) 2 .(24.35) 

where the summation E ' takes place over known values. Our estimators are then deter¬ 
minate and may be written b k and m\ . 
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We will now estimate the missing value on the plot (j, k) by the equation 

b k -I - fti r • • ■ • • • (24.36) 

We have 

£ (x ik — a } — b k — ra) 2 = £' ( x ik — a } — b k — m) 2 + £ (X ik — a t — b k — m)*. (24.37) 

t* 

Let us now consider this as a function to be minimised, involving the unknowns a, b, m 
and r further unknowns Xjk- The equations giving the latter will be obtained by differ¬ 
entiating (24.37) with respect to each X jk , and in fact are typified by 

Xj k — cij + b k + m', 

that is to say, by (24.36). The other constants are given by such equations as 

£' ( x jk — aj — b' k — m') + £ (X jk — a' } — b' k — m!) = 0. . (24.38) 

r 

The second term vanishes, and hence we obtain the same minimal values for a'p b k and 
m ' as by minimising (24.35) by itself. Furthermore, the equations of estimation (24.38) 
may be written 

E ( x jk — a] — b k — m') = 0, . . . . (24.39) ’ 

where the summation takes place over all values, those of the observed z’s where known 
and over the estimated X’s where values are missing. 

It follows that if we write X jk for the r missing values, ascertain the residual sum of 
squares, which will be a function of observations and these r unknowns, and minimise 
it for variation in these unknowns, we shall obtain equations providing estimates of the 
unknowns equivalent to (24.38). The following example illustrates the method. 

Example 24.5 (Yates, 1933b) 

The following table shows the measurements of intensity of infection of certain potato 
tubers under eight manurial treatments in ten blocks. 


TABLE 24.6 

Intensity of Infection of Potato Tubers. 
Blocks 


Treat¬ 

ments. 

1 

2 

3 

4 

5 

e 

7 

8 

9 

10 

Totals. 

1 

3-55 

2*29 

b 

2*00 

3*34 

3*83 

3*86 

3*50 

2*23 

2*91 

27-51 + b 

2 

2 30 

4*03 

2*54 

2*82 

3*29 

2*93 

/ 

2*56 

2*20 

2*30 

24-96 +/ 

3 

3-96 

3*62 

3*46 

2*60 

2*94 

3*70 

3*82 

2*54 

3*18 

3*69 

33-41 

4 

2*99 

3*99 

2*90 

3*97 

4*49 

4-70 

3*86 

h 

3*50 

3-59 

33-99 + h 

5 

a 

3*07 

3*49 

1*07 

3*99 

3*48 

3*80 

3*68 

3*24 

2*70 

28-62 + a 

6 

2*36 

3*47 

2*64 

3*17 

3*26 

3*28 

g 

i 

3*07 

3*12 

24-37 + g + i 

7 

2*16 

2*34 

1*96 

2*60 

3*77 

d 

3*20 

3*47 

2*67 

3*33 

26-50 + d 

8 

3-16 

2*52 

2*39 

. 3*68 

c 

e 

3*85 

3*36 

2*50 

4*13 

j 25-69 + c + e 

Totals 

20*48 

26*33 

19*38 

21*81 

28*08 

21*92 

22*39 

19*10 

22*59 

25*77 

j 223*85 -f- fl 


+ a 


4* b 


4 - c 

4 ~d 4 ~e 

+/ 4 -Q 

+h+i 



! +6+c+d+e 

i +f+g+h+i 
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There are nine missing values in this table, indicated by the letters a ... i. Omitting 
purely numerical terms, which are irrelevant for the purposes of minimisation, we have 
for the total sum of squares, 

a a + 6 2 + c 2 + . . . + i a - Vo (223-85 + a + b + c + . . . + i) 2 ; 
for the sum of squares between blocks, 
i {(20-48 + a) 2 + (19-38 + 6) 2 + . . . + (19-10 + h + i) 2 } 

— sV (223-85 + a + b + c + • • •+*)*» 

and for that between treatments, 

tV { (27-51 + 6) 2 + <24-96 + /) 2 + . . . + (25-59 + c + e) 2 } 

— iV (223-85 + ® + 6+ c + . . .+ i) 2 . 
The residual sum of squares is the difference of the first and the sum of the second and 
third of these expressions. For minimisation we differentiate with respect to a, 6, ... i 
in turn. On some arithmetic simplification we find 


63a + 

b + 

c + 

d -f* 

e + 

f + 

9 + 

h + 

i = 209 11 

a + 636 + 

C + 

d + 

c + 

f + 

9 + 

h + 

i = 19003 

a + 

6 + 

63c + 

d - 

7c + 

/ + 

9 4- 

h + 

i = 231-67 

a + 

6 + 

c + 

63 d ~ 

9e + 

/ + 

9 + 

h -f" 

i = 199-36 

a -f- 

6 - 

7c - 

9 d + 

63c |- 

i + 

9 + 

h + 

i = 200-07 

a + 

6 + 

c + 

d 4' 

e + 

63/ - 

9<7 + 

h + 

i = 199-73 

a + 

b + 

C + 

d + 

e — 

9/ + 

63g( + 

h - 

7 i = 195-01 

a + 

6 + 

C + 

d + 

e + 

/ + 

9 + 

636 - 

9 i = 239-07 

a + 

6 + 

c + 

d + 

c + 

/- 

7 9~ 

96 + 

63i - 162-11 


This set of linear equations can, of course, be solved by routine methods, but also by iterative 
processes as follows :— 

The mean of existent values is 3-15. Assume this to be approximately the values of 
6 , c . . . i. Then for a we have, from the first of the above equations — 

a = (209-11 - (8 x 3-15) } = 2-92. 

Taking this value of a and 3*15 for c, d ... i, we find for b from the second equation, 

6 = - 6 V (190-03 - (7 x 3-15) - 2-92} = 2-62. 

Similarly, from the third equation, 

c == * (231-67 + (2 X 3-15) - 2-92 - 2-62} = 3-69, 

and so on. On reaching i we recalculate a from the first equation, using the approximations 
to the values of the other constants already obtained ; and so on until our values do not 
alter. In this case only a second approximation is necessary, the values being— 



a 

b 

c 

d 

e 

/ 

* ! 

h 

i 

i 

First Approx. . 

Second Approx. 

2-92 , 
2-88 ! 

2-62 

2-58 

3-69 

3-73 

3-27 1 
3-33 

3-76 

3-76 

3 26 
3-32 

3-60 

3-61 

3-88 
3-89 ! 

i 

3-22 
3-22 , 


These are our estimates of missing yields. The treatment means are found to be :— 
1 2 3 4 5 6 7 

3-009 2*828 3*341 


3-788 3-140 3-120 2-883 


8 

3-308 
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24.19. The question now arises how we may analyse the variance of data for which 
missing values have been estimated in this way. 

The original data provided a classification with unequal numbers in subclasses and 
can be analysed by the methods given earlier in the chapter; except that, since no sub¬ 
class contains more than one member, we cannot find a residual sum of squares within sub¬ 
classes based on N —- pq d.f. (JV — pq, in fact, is a negative number.) For instance, 
regarding the data as a one-way classification with pq — r classes, we shall have an analysis 
of this type :— 


Sums of squares d.f. 

Between classes * . . p + q — 2 

Residual . . . . (p — 1) (q — 1) — r 

Total . . . • pq — r — 1 


. (24.40) 


The effect of the two classifications separately can be dealt with in the manner of 
Example 24.1. 


24.20. Two simplifications are possible. In the first place, since the minimisation 
of the residual is the same for the original data as for the data completed by estimates of 
missing values, we can use the latter to compute the residual precisely as for an orthogonal 
case, which simplifies the arithmetic. 

Secondly, it appears that to an adequate approximation we may substitute the esti¬ 
mated values for missing values and analyse the resulting material in the ordinary way 
as if it were orthogonal. If the proportion of missing values is high this approximation 
may perhaps break down, and in practice we should probably regard the experiment as 
ruined. More usually only a few records are missing, and the effect of replacing them by 
estimates is hardly likely to affect judgments of significance seriously. 

Example 24.6 

Continuing the analysis of the data of the previous example, we find, for the total sum 
of squares, 32*1012 with 70 d.f. The analysis of the completed data, that is to say the original 
data plus the estimates of missing values, is as follows :— 


Sum of Squares 


d.f. 

Quotient. 

Between blocks. 

9*7176 

9 

1*0797 

Between treatments .... 

6*5812 

7 

0*9402 

Residual. 

17*6902 

54 

0*3276 

Totals . 

33*9890 

70 



* It is assumed that no row or column in the two-way classification is entirely empty. If it were, 
we should have to ignore it and confine attention to the remaining arrays. 
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Treating the original data as a case of unequal class numbers we find :— 


Sum of Squares. 

d.f. 

Quotient. 

Between blocks and treatments i 

14*4110 

16 

0*9007 

Residual. 

17*6902 

! 54 

0*3276 

Totals.. 

i 

321012 1 

1 

70 

. 


For blocks only :— 



Sum of Squares. 


| Between blocks. 8*5690 

! Remainder. 5*8420 


d.f. ; Quotient. 


9 

7 


0*9521 

0*8346 


Blocks and treatments . • 14*4110 


16 


For treatments only - 


Sum of Squares. 


d.f. Quotient. 


i Between treatments .... 6*2648 7 

Remainder. 8*1462 9 

_, i 

• 1 

Blocks and treatments . 14*4110 16 


0*8950 

0*9051 


Whether we use the analysis of completed data or the more exact form, we see that 
differences between blocks and between treatments are significant as judged by the residual 
variance. The two analyses are, in fact, not very different, and even with as many as nine 
missing values out of 80 we should not err by substituting estimated values and treating 
the data as orthogonal. 

Relationship with Regression Analysis 

24.21. The general w-way classifications to which variance-analysis may be applied 
are not necessarily determined by a measurable variate. As for contingency tables, rows 
or columns can be interchanged without affecting the analysis. We can, however, regard 
a multivariate frequency table as an n-way classification and apply variance-analysis to 
it; and just as regression and correlation analysis provide a refinement on contingency 
analysis because of the arrangement of the classes in order by reference to a variate, so we 
may to some extent refine the analysis of variance in such a case. 

24.22. Consider in the first instance a p x q table of frequencies in the form of a 
correlation table. We will suppose the -4-classification to be according to the variate x 
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and the ^classification according to y. Let us now consider the hypothesis that the data 
emanate from a normal bivariate population with zero correlation (or, somewhat more 
generally, that for any given y the x*a are distributed normally with the same mean and 
variance). We can then regard the data as a one-way classification according to y with 
unequal frequencies and analyse the variance in the usual form:— 


Sum of Squares. 

d.f. 

Quotient. 

Between classes . . . 

YnjVij-il)' 

1 

1 

i 

q - i ; 

N ri^var x 
q — 1 





Residual. 

£ (as a — fy) % 

| N-q ' 

i | 

N (1 — ti % ) var as 
N ~q 

Totals . . . 

N var x 

| N -1 j 

1 



Here Xj is the mean of n i x -values in the jfth 2 /-class, x is the mean of all N values, x tj is the 
variate-value in the ith cr-class and jth y-class, and there are q ^-classes. The quotients 
are expressible in terms of the correlation ratio of x on y , viz. rj xy (cf. 14.23. vol. I, p. 351). 

Now, on our hypothesis, the sums of squares between classes and the residual are 
independently distributed in the Type III form, and hence the variance ratio 


rj 2 N — q 
q — l 1 — rj 2 


. (24.41) 


can be tested in Fisher’s distribution with v x = q — 1, v 2 = N — q. This is the test we 
gave in 14.25 (vol. I, p. 353) and it is reached by an argument of essentially the same 
kind. 


24.23. Now suppose that our p x q table is normal but correlated; or, somewhat 
more generally, that the values in arrays of constant y are normally distributed with the 
same variance but with means which vary linearly with y, say 

rrij a m + by r .(24.42) 

Then our data can be represented by the form 

Xq = m + by f + . . . . (24.43) 

where the f’s are distributed normally with zero mean and the same variance v. Apart 

from the constant m, the only unknown here is the constant 6. Our least-squares estimates 
(measuring from the means of x and y) now lead to the familiar form for the regression 
coefficient 

. (24,44) 

where summation takes place over all values observed. This is, of course, equivalent to 

j _ «Kfe.*>,.(24.46) 

var y 
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Further, th^ reduction in sum of squares attributable to fitting the constant b is 

Nb cov (x, y) = — y) = N r * yar x, (24.46) 

var y 

where r is the correlation coefficient of the sample. 

Our analysis of variance may then be written— 


TABLE 24.7 

Analysis of Variance of a Correlation Table 


Sum of Squaros. 


d.f. 

Quotient. 

Regression constant 6. 

1 JVr 2 var x 

1 

Nr 2 var x 

Between classes (after regression is eliminated) 

; N (* 7 2 — r 2 ) var x 

? - 2 

rj* — r 2 

N q -2 Var * 

Residual. 

N (1 — T? 2 ) var x 

N - q 

» r 1 “ V* 

N *77- var x 

N - q 

Totals . 

N var x 

N - 1 

1 


This analysis gives us a test of the significance of the correlation coefficient in samples 
from an uncorrelated population and also of linearity of regression. 

In fact, if the parent correlation is zero, the parent value of b is zero and the quotient 
due to b is independent of the sum of the other items in the analysis. Thus the ratio 

_Nr*v ar»_ = .( 24 . 47 ) 

N (1 — r 2 ) var x 1 — r 2 

is distributed in Fisher’s form with v t = 1, v 2 = N — 2. This is equivalent to saying that 


J 


r* (N - 2) 


1 _ r i. 

is distributed in “ Student’s ” form with N — 2 d.f., which brings us back by a different 
route to the test given in 14.15 (vol. I, p. 342). 

24.24. Secondly, if we assume that the parent correlation is not zero but the regres¬ 
sion is linear, the sum of squares between classes after regression is eliminated is independent 
of the residual in Table 24.7, and hence the ratio 


at Vr 
N var x — 


N var x 


1 — Tj 2 


_ rj 2 — r 2 N — q 
~~ q — 2 1 — tj 2 


(24.49) 


N-q 

is distributed in Fisher’s form with v r = q — 2, v % — N — q. This test (due to Fisher 
himself) gives a test of linearity of regression in the normal case. 

It should be noticed that this test is only approximate if the classification is one of 
a normal population with broad groupings. If correlation exists, the distribution of a 
bivariate normal sample in an array of finite width is not exactly normal, being the sum 
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of a number of normal distributions with slightly different means. Unless the grouping 
is very coarse, this is not likely to invalidate tests of significance in practice. 


24.25. Consider now the general regression formula for p variates,— 

*i = b 2 x % + 6, x 9 + . . . + b p x p .(24.50) 

If we assume that the residuals x x — ^ bj x j (say x) are distributed normally with 

TA 

constant variance, our least-squares estimates of the regression coefficients are those given 
by the usual theory, and the fitting of (p — 1) constants reduces the sum of squares by 
N var x J? a , where R is the multiple correlation coefficient (cf. 15.16, vol. I, p. 380). We 
then have the analysis— 


Sum of Squares. 

d.f. 

Quotient. 

Between classes (regression constants) 

N var x R 2 

P - 1 

R ' AT 

, N var x 
P - 1 




1 -22* 

Residual. 

N vara? (1 — i? 2 ) 

N -p 

- N var x 

N — p 

TOTAIiS. 

N var x 

N - 1 

! 



If the regression is in fact linear of type (24.50), the residual quotient is independent of 
that due to fitting regression constants, and the hypothesis may be tested by means of 
the ratio 


R 2 N -p 
p - 1 1 -R 2 ’ 

which is distributed in Fisher’s form with v x = p — 1, v 2 = N — p. 
the distribution of R 2 given in 15.20. 


. (24.51) 
This brings us to 


24.26. It is to be observed that in (24.50) we may choose the variates x» . . . x p 
as we please. In particular, we can take them to be polynomials of a single variate. From 
this point of view the analysis of variance links up with the theory of regression analysis, 
given in Chapter 22. If the polynomials are orthogonal we can fit the constants b one 
at a time, the fitting of any constant leaving unchanged the previous determination of those 
of lower orders. The reduction in sum of squares for each constant can be separately 
ascertained and corresponds to the loss of a further degree of freedom ; and at any stage 
we may test the residual variance to see whether any particular term is worth while in the 
sense that it makes a significant contribution to the total variance. The exact test, of 
course, depends on the usual assumptions of normality. 


24.27. The reader is now in a position to see a number of statistical topics which 
on the surface appear to be distinct as parts of a single theory. Regression analysis, with 
its subsidiary of correlation analysis, proceeds by the successive fitting of constants by 
least-squares. For the normal case this is equivalent to estimation by maximum likelihood. 
Partial and multiple regression, together with curvilinear regression, can all be subsumed 
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under this central idea. The fitting of each constant splits off a separate contribution to 
the total variance which, under certain hypotheses, is independent of the others. Variance- 
analysis proceeds in much the same way, but is more general in the sense that it can deal 
with the classification of values, however determined; Our various exact tests of signifi¬ 
cance of homogeneity in variance, of linearity of regression, of significance of correlations 
in uncorrelated material, of the difference of two means where variances are equal, of the 
correlation ratios, of the multiple correlation coefficient—all derive ultimately from Fisher’s 
distribution of the variance-ratio in the normal case. 


The Analysis of Covariance 

24.28. Suppose that wc have a one-way classification, possibly with unequal numbers, 
and that in each class the members present values not of a single variate, such as we have 
considered up to now, but pairs of variate-values typified by x ij} y ijt j referring as usual 
to class and i to the number within the class. By the ordinary methods of variance-analysis 
we caiv discuss the effect of classification either on the a-variate or on the y -variate ; but 
there also arises for consideration the effect of class-membership on the covariation of 
x and y . This leads us to an extension of the analysis of variance to that of covariance. 


24.29. By an easy extension of the results for a single variate we have, analogously to 

~ *..)* = ~ X ’J* + 2 n i ~ *..)* 

i.i i,j i 

the equation in product terms 

- *..) (va - y..) = - x .j) (Vi) - y.i) + (*.i - *..) (y.j - y..) (24.52) 

hi hi i 

If we consider the whole sample as homogeneous the correlation between x and y is given by 


_ = s to* - *..) {y ±t - y..) _ 

We have also the correlation between means of classes 

£ (xj - xj (y %i - yj 

Wfrj -*..)*• ~x\y.i-y.V} 

and may calculate a correlation of residuals within classes 

Z (Xi) - Xj) (y<j - y.j) 

\ZiZ iXij - x.j)* Z ( yii - y.j)*Y 


(24.53) 


. (24.54) 


. (24.55) 


24.30. If there is heterogeneity present we should expect these correlations to differ ; 
and similarly for the three kinds of regression of y on x, such as 


Z ( x { j - x..) (ytj - y..) 


. (24.56) 


The three correlations of (24.53)-(24.55) are, however, not additive, like sums of squares ; 
nor are the regressions corresponding. The covariances expressed by (24.52) are additive, 
but there is no simple test, such as exists for variance-ratios, to determine the significance 
of differences or ratios of covariances. Covariance analysis, however, is not primarily 
deai g nftd to test independence, but to examine whether there is any variation according 
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to class between the regressions of y on * within and between classes. Let us suppose 
that there is some linear relation of the form 


Y - n v = /? (X - n x ). . 

Following the notation of E. S. Pearson, we write 

^iij — 2 — Xj) 2 

c 22 ] = z (Vij — y.f) z 

C l2j = Z (x tj - Xj) ( Vij - y ti ) 

i 

Clla = % Q 11 ] 

^22 a — 2 C 2 2] 

0 12a = Z C l2j 

i 

@i\m = Zrij (x mj — a:..) 2 
i 

C 22m = Zn f (yj -y,y 

Cum = (xj - *..) (yj - yj 
j 



. (24.57) 


. (24.58) 


. (24.59) 


. (24.60) 


and C ll0 , C 2i0 , C li0 for the corresponding total sums of squares and products. We may 
then exhibit the composition of the total sums of squares and products in the form of Table 
24.8. The arithmetic of the analysis follows that of ordinary variance-analysis. We 
shall give an example presently. 


TABLE 24.8 


Analysis of Variance and Covariance for One-Way Classification—Sums of Sqvxires and 

Products and Regression Coefficients. 


Variation. 

d.f. 

Sum of Squares, 
a;-variate. 

Sum of Squares.: 
2 /-variate. 

i 

Sum of Products. 

Regression 

Coefficients. 

Within jth group 

U] - 1 

Cm 

j 

! 

022 j ; 

cm ; 

, C\ 2j 

b * Cm 

Within groups . 

N -p 

Cna 

022 a 

012 a 

t 012a 

ba ~ Cm 

Between groups 

p — 1 

Ciim 

022 m 

012 m | 

, 012m 

m ~ 011m 

Totals . 

N - 1 

| 

0110 

i 

0220 

i 

0120 ! 

, 0120 

6 ° “ Ono 


We now suppose that, apart from the regression effects represented by (24.57), the 
variation of a: is normal with constant variance v. We can then compile various estimates 
of v from the residual variation after the effect of fitting regression constants has been 
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removed. ■ For instance, within classes we have for the estimator of v, with N — 2 p degrees 
of freedom, 

I** ~ ~ h i ( X V ~ *<*»“] 

~ N — 2p f ~ Cl2 ^ 


1 


N — 2p 


S u say. 


The number of degrees of freedom follows from the fact that we have fitted a mean and 
a regression coefficient to each of p classes, making a reduction of 2p in all. We then obtain 
Table 24.9 


TABLE 24.9 

Analysis of Covariance for One-Way Classification with Linear Regressions . 


Variation due to 

d.f. 

Sum of Squares. 

Deviations from linear regressions 
within classes . 

N - 2p 

7^7 (w - y-i - b i (*« - *./)>* 

=» / , (022j — bjCi2j) = <Sf x 

i 

Differences among regressions . 

p -1 

£ ( b] - baf (*ij ~ x .}) 3 

“ V 

/ {bj C\ 2 j) — b a C'lia — S t 

j 

Deviations within classes from 
linear regression b a ... . 

N — p — 1 

^7 - y-i - ba (Xij — 



-- 6 T 22 a — b a Ci2a ~ $1 + 

Deviations between classes from 
linear regression b m ... . 

p — 2 

J7 - V- - bm (*.* - *..)}* 

i 

- Cam — bm c I2nt = $3 

Differences between b a and b m 

1 

j 

^7 ((*« _ 6 »»> (*<* - *.*) 



+ (6m - 6 0 ) (*y - *..)>» 

.. , ..OitaOum 

-(6a-6m)' Cuo “ *« 

Total deviation from linear regres- 
sion 6 0 . . • 

N - 2 

: 

J? (w - y.. - 6, (xa - *..)}* 

— ^220 ~ ^0^120 










240 


THE ANALYSIS OP VARIANCE 


The reader will probably find it useful to check the expressions in the third column of 
Table 24.9 and to examine how the sum of squares of deviations from the regression line 
of the whole is analysed into the constituent items. 


24.31. Suppose now that we wish to test whether the relationship between x and y 
can be represented by the formula (24.57), and that there is no material class-effect present. 
Then S x of Table 24.9 should be an unbiassed estimator of (N — 2 p) v and should be inde¬ 
pendent of the residual estimator S t + S, + S t , which has 2p — 2 d.f. We may therefore 
test the hypothesis by the ratio 


S 1 2p — 2 

• s t + S7+S,: 


Vi = N - 2 p, 


v t = 2p — 2. 


. (24.61) 


If this variance ratio is insignificant we consider next whether the regressions differ in 
the p classes. For this purpose we compare the estimator derived from S, with that based 
on S t ; i.e. the ratio 


S t N - 2p 
V -1 ' ’ 


Vx — p — 1, Vt — N — 2p 


. (24.62) 


will be significant if differences are to be regarded as real. 

If this ratio is not significant, S 1 and 8 t may be pooled. Comparison of their sum 
with 8 a will afford a test whether the relation between group means is linear. The ratio 
for this purpose is 


8i + Sf p — 2 

N- P -1 ' S t ’ 


Vi=N —p — 1, 


v i — p — 2 . 


. (24.63) 


Finally, even if this ratio is not significant, it does not follow that the common regression 
within groups is the same as the regression of the means of groups. To test this point 
we consider the ratio 


S x + S t JL_ 
N — p — 1 ' SS 


Vi = N — p — 1, 


v t = 1. 


. (24.64) 


Example 24.7 

A number of recruits are given a preliminary test to ascertain their suitability for a 
certain course of training. At the end of the training course they undergo a proficiency 
test. The marks for three groups of recruits from three different towns are— 

r (Preliminary: 45, 50, 56, 58, 59, 60, 62, 64, 65, 75 

(jrroup ^Proficiency. 46> 60) 52, 46, 48, 50, 55, 63, 58, 64 

r /Preliminary: 44, 49, 52, 52, 58, 59, 60, 62, 63, 63, 66, 69, 70, 72, 73 

brroup ^jp rofi ciency: 48, 55, 45, 60, 65, 64, 69, 71, 77, 70, 75, 80, 72, 76, 81 

r /Preliminary: 47, 52, 59, 60, 63, 66, 68, 69, 74, 76 > 

Uroup -^proficiency: 43, 56, 51, 72, 60, 61, 55, 74, 72, 80. 

We are interested here in the efficiency of the preliminary test as a predictor of the 
proficiency test. We therefore consider the regression of the marks obtained in the latter 
(y) on those obtained in the former (x). We are, however, also very much interested in 
the .question whether the regressions are the same, apart from purely sampling effects, 
in the three groups. Such a matter would naturally arise, for instance, if we were thinking 
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of applying the same rejection standards in preliminary tests to all recruits, irrespective of 
their town of origin. 

Our scores are given to the nearest unit, and hence the variates are discontinuous. 
We will neglect this effect and assume that the scores are distributed approximately 
normally. 


About origin x — y 

= 50 the 

sums of squares and 

cross-products are :— 


! 

n . 

•EM. 

Z(y)- 


A’O/*>. 

27 (x?/). 

Group 1 . . . . ; 

10 i 

94 

42 

1496 

594 

694 ; 

Group 2 .... 

15 

162 

257 

2802 

6101 

3989 1 

Group 3. 

10 

134 

124 

2550 

2776 

2422 


We can then calculate the quantities C. For instance, 

C ni = 1496 - 94 = 612*4 

10 

94 

C l2 i = 694 - 42 ** = 299*2 
f'na “ I* ^112 -1- C X13 , etc. 
We find the following table in the form of Table 24.8 :— 


TABLE 24.10 

Analysis of Variance and Covariance for Data of Example 24.7— Sums of Squares and Products 

and Regressions 


Variation. 

Within first group 
„ second group 
„ third group 

Within groups . 
Between groups 

Totals 


d.f. 


Sum of Squares. 


Sinn of Squares. Sum of Products.| 
?V a - i 


Regressions. 


9 

<T,„ - 

612*4 

f'22l “ 

417*0 j 


299*2 


-= 0*4886 

14 

C’,,* = 

1052*4 

^'223 ” 

1697*73 | 

l 'I23 

1213*4 

b. 

1*1530 

j 9 | C xvt 

760*4 

G 2 o 3 — 

1238*4 I 

1 

C X '»’S “ 

760-4 j 

b 3 

= 1-0000 

\ 32 

C\\ a 

2425-2 

G*J2 a “ 

3353*73 ! 

G|2 a ' 

2273*0 ; 

ba 

- 0*9372 

2 

C\\ m ■= 

83-09 

C'22m -- 

1005*01 : 

C{ 2m - 

118-57 J 


= 1-4270 

j 34 

C„ o - 

2508-29 

n 

'>220 

4358*74 

C\2ii ~ 

2391-57 j 

K 

- 0-9535 


A comparison of the three regressions within groups indicates some heterogeneity. 
It looks as if the preliminary test is not such a good predictor for the first group as for 
the others. We may proceed to test the reality of this effect by constructing Table 24.11 
on the lines of Table 24.9. For instance, 

S x = E (C 2 . 2j — C nj bj) — (417*6 - 299*2 x 0*4886) + (two similar terms) 


A.S.—VOL. II. 


- 1048*1. 


It 
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We find— TABLE 24.11 


Analysis of Covariance of Data of Example 24.7— Linear Regressions. 


1 

Variation. 

d.f. 

Sums S. 

Quotient. 

Deviations from regressions bj ... 

29 

S x = 10481 

36*1 

Differences bj . 

2 

S 2 = 175*4 

87*7 

Deviations from b a . 

31 

S x 4* S 2 - 1223*5 

39*5 

Deviations of groups from b m . 

1 

S 3 = 835*6 

835*6 

Difference between b a and b m ... 

1 ! 

S A = 19*3 

19*3 

Totals . 

_ _ _ .... i 

1 

33 

S x + & 2 + S z + « 2078*4 



A comparison of the quotient 36*1 (29 d.f.) with the quotient of the remaining items, 
257*6 (4 d.f.) indicates that there are real differences between classes. A single regression 
equation will not represent all three class-relations. A comparison of the deviations from 
regressions, 36*1 (29 d.f.), with the differences of regressions among themselves, 87*7 
(2 d.f.), does not reject the hypothesis of equality of regressions within groups. We there¬ 
fore compare the deviations from b a , 39*5 (31 d.f.), with the deviations of groups from b m , 
836*6 (1 d.f.). This is significant, suggesting that the hypothesis of linearity of regression 
of group-means should be rejected. 

The general result is to confirm our suspicion of heterogeneity. The correlation 
coefficients between x and y are— 


Within first group 




. 0*592 

„ second group . 




. 0*908 

„ third group 




. 0*784 

Within groups . 




. 0*797 

Between groups . 




. 0*410 

Total 



. 

. 0*722 


Again the deviations between groups stand out as indicating heterogeneity. 

24.32. The analysis of covariance may be extended to the case where there is more 
than one independent variate. The regression coefficients are found in the usual way, 
and the sums of squares after regressions have been removed can be found and compared 
on the usual hypotheses. Suppose, for instance, there are two independent variates and 
a classification giving an analysis between classes and residual. We may represent the 
analysis thus:— 



d.f. 

Sum of Squares. 

Sum of Products. 

Between classes 

x\ 

x\ 

V' 


y *i 

yx a 

n 

A 

B 

C 

P 

Q 

B 

Residual. 

n' 

A ' 

B' 

C' 

P' 

Q' 

R' 

Totals . . . 

n* 

A* 

B* 

C" 

___J 

P* 

Q " 

R* 
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Our regressions are then— 


j Between classes 

I 

I Residual 


Totals 

i 


bi 

b t 

BQ - PR 

AR ~PQ 

AB - P* 

AB - P a 

B'Q' - P'R' 

A'R' - P'Q' 

A'B' — P' a 

A 7 B r - P' a 

B"Q* - P"R* 

A'P* - P'Q* 

A"B" - P ** 

A*B* - P** 


The sums of squares C can then be reduced by eliminating regressions, i.e. by subtracting 
Qb t + Rb 2 , giving 

c __ BQ* — PQR _ AR* - PQR 
AB-P* AB-P 2 


ABC - AR* - BQ* - CP* + 2 PQR 
AB-P* 


. (24.65) 


This and the analogous quantities with primes give independent estimators of the 
variance of the residual element, and a comparison to test homogeneity may be made in 
the usual way. 


24.33. In a case such as that of Example 24.7 it is evident that a comparison of 
//-means between groups is affected by what we know about the ^-values. If we know nothing 
about the latter, comparison of the v/’s is a univariate problem and can be treated by the 
methods already discussed, the difference of means, for example, being tested by the use 
of standard errors or the £-test. But suppose that our x’a themselves are found to be dif¬ 
ferent between groups and that there is significant correlation between x and y. Then 
it is possible that the relation, if any, between y 's in different groups is not, so to speak, 
an inherent quality of the variation of y, but is merely a reflection of their dependence on 
the z' s, which happen to exhibit significant differences. In Example 24.7, differences in 
proficiency between groups may be due simply to differences of ability which were present 
before the training began and, if so, should be shown by differences between groups in the 
preliminary scores. We should not then be able to conclude from proficiency scores alone 
that training in one group had a more marked effect than in another. The differences 
were there before the training was applied. 

24.34. If, then, we require to consider the effects of training alone on the groups, 
we may “ correct ” the //-values by deducting the estimates 

Y ij = V.. + *>o ~ *.) • . . . (24.66) 

or other more general regression equations. This, so to speak, allows for differences due 
to variations of the #-variate. 
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Assuming that one linear regression equation adequately describes the relationship 
between y and x , so that the corrected values are 

Va - Y ij ^ Vii - y.. - bo (% - * .), .... (24.67) 
we see that the difference of the corrected means of two classes y mj and y tk is 

y.i - V.k - b a (Xj - x >fc ). .... (24.68) 

This may be regarded as the sum of two parts which are independent. The estimated 

2s 2 

variance of the first part, y^ — y mk , is ^ , where s 2 is the mean-square of the residual after 
correcting for regression and the means of y j and y k are both based on q members. Simi- 

s 2 

larly the variance of b is ^ where A is the sum of squares of the variate entering into 

the residual row of the analysis. Regarding the x’a as fixed from sample to sample, so 
that our inference is conditional, we see that the variance of the difference (24.68) is given by 

.... (24.69) 

The ratio of the difference to the square root of this expression is distributed as “ Student’s ” 
t , with degrees of freedom one fewer in number than those of the original residual. 


24.35. Similarly, if we have two independent variables x t and x 2 , the corrected 
difference of y-means is 


y.i - y.k - {*1 - * 1 *) + b t (Xjy - X ik ) } . . . (24.70) 

where temporarily we write x l} for the mean of the variate x x in the jth class, and so on. 
The variance of the part in curly brackets may be derived by considering the variance of 
the general expression A6, + fth 2 . From the equations for b x and b 2 we have 


bt 

bo 


B E (yx 1 ) -PE (yxo) 'j 
* AB — P* 

— PE (yx x ) + A E (yxo) 
AB — P* ' . 


(24.71) 


where, as in 24.32, A and B are the sums of squares for x„ x 2 , and P is the cross-product. 
Thus the coefficient of any y in ?.b 2 + yb 2 is 

(XB — yP) Xi + (yA — XP) x 2 
AB-P 2 ' 


Since the y 's are independent the estimated variance of Xb t + yb 2 is 


(AB - P* ij» 


{ A (XB — yP) 2 + 2P (XB - yP) (yA - XP) + B (yA - XP) 2 } 


_ X 2 B — 2XyP + y 2 A 4 
AB - P 2 5 ' 


(24.72) 


Thus for the estimated variance of the corrected difference (24.70) we have 


f 2 X 2 B - 2XyP +y 2 A\ 
\ q AB - P 2 ~) 


. (24.73) 


where X = Xy — x ik and y = x 2 , — x ilt . As usual, the difference divided by the square 
root of this quantity may be tested in the ^-distribution. 
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24.36. Our account of the analysis of variance and covariance has not attempted 
to cover all the applications of the method in particular directions. We have concentrated 
so far as possible on the fundamental ideas and the broad lines of analysis to which they 
lead. Some further developments will be given in later chapters, but we must refer the 
reader who requires a complete acquaintance with the subject to the references given at 
the end of this chapter and the preceding. We will conclude our exposition with three 
final comments. 

(a) Part of our hypothesis throughout has been that the residual element C has constant 
variance from one subclass to another. In Chapter 26 we shall discuss methods of testing 
homogeneity in residual variance. For completeness we might perhaps have anticipated 
some of these tests in the present chapter, at least to the extent of exemplifying thefr use. 
We have not done so mainly for reasons of economy in space ; but the omission of mention 
of the point in foregoing examples should not lead the reader to overlook (as many writers 
do overlook) the necessity for testing variance-homogeneity where possible, if it is required 
as part of the hypothesis. 

(6) In the majority of our examples we have proceeded at once to analyses of variance 
or covariance without dwelling on points which would require attention in any practical 
inquiry. For instance, since the primary function of many variance-analyses is to test 
the homogeneity of a set of class-means, the first stage would be to compute those means 
and examine whether they suggest any lack of homogeneity on intuitive grounds. Again, 
if heterogeneity is established, consideration of the means themselves, or of the primary 
data, will sometimes show how it arises. The student must never lose sight of his primary 
material. 

(c) Elaborating this point to some extent, we would emphasise that the analysis of 
variance, like other statistical techniques, is not a mill which will grind out results auto¬ 
matically without care or forethought on the part of the operator. It is a rather delicate 
instrument which can be called into play when precision is needed, but requires skill as 
well as enthusiasm to apply to the best advantage. The reader who roves among the 
literature of the subject will sometimes find elaborate analyses applied to data in order to 
prove something which was almost obvious from careful inspection right from the start; 
or he will find results stated without qualification assignificant without any attempt 
at critical appreciation. This is not the occasion to deliver a homily on the necessity for 
self-discipline in the use of advanced theoretical techniques, but the analysis of variance 
would provide quite a good text for a discourse on that interesting subject. 


NOTES AND REFERENCES 

For the analysis of variance where subclass frequencies are unequal, see Brandt (1933) 
and an important paper by Yates (1934a). Wilks (1938e) has considered the subject from 
the theoretical viewpoint and exhibited the main results determinantally. For the missing 
plot technique see Allan and Wishart (1930) and Yates (19336). For the analysis of 
covariance see Fisher’s Statistical Methods , Bartlett (1934a), an appendix by E. 8. Pearson 
to a paper by Wilsdon (1934), Brady (1935), Wishart (1936), and Day and Fisher (1937). 
The last-mentioned paper works through a practical example in some detail and will 
repay study. 

See also references to the previous chapter. 
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EXERCISES 

24.1. For a two-way classification with one member in each subclass show that, 
for normal variation, 

E (Xj. - *.) (x.* - xj = 0, 

and hence that the sums Z (x* — x ) 2 and E(x k — x ) 2 are independent. Examine 

i * 

how this breaks down for the non-orthogonal case. 

24.2. Verify the arithmetic of Example 24.6. 

24.3. Generalise formula (24.73) in the following way. If there are m independent 
variates, the variance of corrected differences is 

771 - 

{ \ + Ed C rs K K 
1(7 r, 8=1 

where k r = — x rk , and c r8 = where A ra is the cofactor of a r8 in the determinant 

| a n |, and a r8 = Zx r x 8 summed over the sample. 

(Wishart, 1936.) 

24.4. . Derive by the analysis of variance the test of a regression coefficient given 

in 22.19. 




CHAPTER 25 

THE DESIGN OF SAMPLING INQUIRIES 

Influence of Theory on Sampling Design 

25.1 • The reader who is accustomed to handling the results of a sampling investigation 
as they appear in everyday statistical work may have wondered more than once in previous 
chapters whether theory was not reaching out too far in advance of practice. It is true 
that for certain types of experimental inquiry, notably in agricultural and biological research, 
the precision of exact statistical tests does not seem out of place ; but in economic or social 
statistics, for example, there is often so much error and imperfection in the raw data that 
the application of refined methods of analysis would be a waste of time. It is clearly 
useless, and may even be dangerous, to exercise an elaborate mathematical technique on 
data which are suspect from the very start of the inquiry. If our theory is to be really 
serviceable to the statistician and not merely an enticing mental exercise it must be capable 
of solving practical problems. 

25.2. Now it has to be admitted that much of the :material with which statisticians 
have to work at the present day cannot be treated by the methods expounded in the fore¬ 
going pages when sampling questions are concerned. The commonest reason, but by no 
means the only one, is that the sampling process by which the data were obtained was 
biassed. In such cases the statistician has to lay aside the refined implements of his craft 
and do the best he can with his refractory material in the light of his own judgment and 
comhionsense. A good deal of current statistical work is of this kind, and there is even 
a section of thought which is inclined to depreciate the advanced theory of the subject as 
“ academic ” in the sense that it is too remote from practical affairs to be worth studying. 
The misunderstanding is not likely to be removed by the counter-accusation sometimes 
launched by theoreticians that the theory is quite capable of being applied by anyone who 
has the ability to comprehend it. 

25.3. Fortunately there is a growing realisation that the two points of view can 
often be reconciled by collecting the data in such a form that the theory can be applied to 
it. If only enough care is taken at the initial stages of an inquiry there is no need for the 
appearance of imperfect data which defy exact analysis. Knowing beforehand what 
theoretical instruments are at our disposal, and armed with a clear understanding of what 
questions we are trying to answer, we can frequently frame the investigation so as to maxi¬ 
mise the information acquired with the minimum of effort. In short, the scope and nature 
of our theory itself dictates, to some extent, the form which the sampling inquiry should 
assume. In former times the statistician was usually asked to extract information from 
data which were collected by inexpert agents, frequently for quite different purposes. 
Nowadays he is still in the same position in some respects, but sometimes he is called in to 
advise on the design of the inquiry and can, within limits, determine the form yi which the 
data are collected. ^He can make his theory applicable by selecting his sample in the 
proper way. j 

25.4. The general theory of the design of sampling inquiries has not progressed far 
enough for us to be able to give a systematic account of it in this chapter. In some fields, 
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particularly that of agricultural experimentation, it has reached quite an advanced degree 
of perfection; in others there remain many problems unsolved and possibly many more 
which have not yet even been formulated. At the risk of some discontinuity of treatment, 
therefore, we shall only give in this chapter a number of instances in which theoretical con¬ 
siderations exert a considerable effect on the scope of a sampling inquiry, in order to illus¬ 
trate the field to be covered. There are, of course, many factors which ultimately deter¬ 
mine the form of an investigation, such as cost and expenditure of time, but they will 
not concern us here. • For the present we shall be concerned solely with the extent to which 
theoretical considerations contribute to all the factors that have to be taken into account 
when an inquiry is designed. 

Some Preliminary Points 

25.5. There are certain preliminary points which, though obvious enough when stated 
explicitly, are often overlooked and cause a good deal of bad design. 

(a) The fundamental object of sampling is to obtain information about a population, 
and it is of the first importance to begin with a clear idea of what that p opulation. 
is. Imagine, for instance, that we are asked to ascertain whether pasteurised milk has 
a different feeding value from raw milk. In what population is this inquiry to be made : 
among children ? among the inhabitants of the British Isles ? among those who habitually 
drink milk or those who do not ? among townspeople or among country folk ? and so 
on. Again, suppose that we are given a new variety of barley and wish to know whether 
it has a heavier yield than a previously known type. Do we mean heavier in the usual 
barley-growing areas ? in every kind of climate or on the average over a series of different 
climatic conditions ? when subject to the same manurial treatments as those in current 
use ? and so on. 

(b) In a similar way, it is necessary to have an equally clear idea of what we are trying 
to fincLoui about the population. In our example of raw and pasteurised milk, are we 
content to know that there is (or is not) a differential effect for children as a whole ? or do 
we wish to ascertain whether any such effect varies at different ages, between sexes, or 
according to nutritional standards ? What exactly should we like to know ? It is no use 
returning the facile reply “ all about it ” to this query, for our information must be limited 
in virtue of the finite size of our sample. We must make up our minds what information 
we require and which questions have priority if it becomes necessary to sacrifice some of 
them for practical reasons. 

(c) Thirdly, we should consider what we know already abo ut our population. This 
point becomes of particular importance when our prior knowledge indicates h eteroge neity, 
for then we may, in effect, have to divide the population into sub-groups and sample separ¬ 
ately from each. In our milk example, it is to be expected that children of different ages 
may react differently, or that children from lower-class schools may respond differently 
from those in middle-class schools. Or again, in our barley example, the two varieties 
may compare quite differently on Hertfordshire loam and on Lincolnshire chalk. It would 
be misleading to lump all the comparisons together when we have strong reason to suspect 
heterogeneity beforehand. In effect, prior knowledge of this kind frequently dictates the 
types of question we ask under (6), and the two are often different facets of the same problem. 

(d) As an extension of the same point, we may notice that prior knowledge abo^t the 
population sometimes indicates what sort of averages to use and what sort of tdpts of 
significance it is proper to apply. Crop-yields, for instance, ate known to be distributed 
in a form approaching the hormal, so that arithmetic means are good estimates qf parent 
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means and the tests based on normal theory may be applied. Accident statistics, on the 
other hand, are often distributed in a modified Poisson form ; income statistics in a J-shaped 
form, and so forth. 

(e) A specification of the population and a decision as to the precise object of the 
inquiry will usually determine certain parameters which it is required to estimate or certain 
hypotheses for test. In general the problem is one of estimation, but not necessarily so. 
In our case of pasteurised and raw milk, for instance, we should probably wish to know 
the exact amount of the difference between the effects of the two (a matter of estimation)^ 
not merely whether a difference existed (a matter of significance). We then wish to know/ 
before the inquiry begins, whether the estimates we shall have are going to be accurate 
enough for our purpose ; or alternatively, if the sample is of a given size, how accurate they 
will be. It may not always be possible to answer such a question completely beforehand, 
since the sampling variances will in general depend on quantities which have to be estimated 
when the data are available, but it is always useful to consider in a general way what sort 
of magnitudes would be shown as significant and what values would leave us still in reason¬ 
able doubt. As a rule, matters such as this are closely related to sample size. 

(/) Finally, our estimates will be subject to experimental error and, in development 
of the last point, we have to try to find the form of experimental design which, while answer¬ 
ing our questions, does so with the minimum error. From slightly different standpoint, 
if we can determine the amount of error which is admissible, the problem is to find the 
design which achieves no more than that error with the minimum expenditure of effort. 
Furthermore, we require to be able to estimate the extent of probable errors. In short, we 
require an efficient design, just as the engineer requires an efficient engine or the aircraft 
designer an efficient form of airscrew, and for exactly the same reasons. 

25.6. To sum up, our primary task in embarking on a sampling inquiry is to ascertain 
as accurately as possible what is the population under examination, and what is the informa¬ 
tion about it which we require. If, as usually is the case, that information concerns statis¬ 
tical characteristics such as means and variances, or more generally frequency-distributions, 
our second task is to design an inquiry which will provide estimates of these unknown 
quantities and will, at the same time, provide estimates of their sampling error. It is not 
always possible, as we shall see later, to obtain full satisfaction in the reduction of error 
and the estimation of error simultaneously. Uncreased accuracy of estimation may mean 
loss of precision in our estimate of sampling error, so that although we are nearer the truth 
we do not know how near.) There does not appear to be any single rule which will cover 
all the cases that can arise. We shall refer to a particular case of some interest in 25.39. 

Stratified Sampling v 

25.7. We consider at the outset a case of fairly frequent occurrence in the sampling 
of existent populations. Suppose we are interested in the mean value of a variate x in 
some population 77; and that we know, or suspect, that the population is heterogeneous 
in the sense that we can delimit sub-populations Z7 lf ri 2 , . . . U k in which the distributions 
according to x may differ. This type of case might, for example, arise if we were sampling 
the population of a town for income, there being districts, wards or even streets which are 
known.$0 be inhabited by classes living at different income-levels. 

Practical considerations alone may require that we draw a prescribed portion of the 
sample ^from each sub-population. For instance, with a town of 500,000 inhabitants it 
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would be most tedious to sample by using random numbers applied to the whole town. 
We should probably divide the work among districts and blocks and select random samples 
within the blocks. This, however, is not to be confused with the division of the town into 
relatively homogeneous districts because of its heterogeneity. Either process is called 
stratification. The problem we shall discuss is this : If we have decided to draw a total 
sample of n members, and can assign at will the number n t drawn from the »th stratum 
n it subject to the condition 2 (n t ) — n, how should we choose the numbers n t , or need we 
choose them at all ? 'Will our estimate of the mean value of x be better if we merely choose 
n members at random from i7, or can we improve it by controlling the numbers n { and not 
merely leaving them to chance ? J 

25.8. Let x y be the jth member of the sample from the tth sub-population, and let 
the latter contain a number N { of members with mean /u { and variance of. If /u is the 
mean of II we shall have 


to .(25. D 

We shall now seek for parameters Ay such that our estimator of fi, say t, is given by 

k 

-ZZ (Ay Xy), ..... (25.2) 

that is to say, is a linear estimator in the observed variate-values. We shall seek for that 
estimator which is unbiassed and has minimum variance, i.e. for which 

E (t) = (i . . . . . . . (25.3) 

E {t — E (t) }* = minimum. .... (25.4) 

Substituting .from (25.2) and (25.1) in (25.3), we find 

E | Ay*y j = ^r2N i fl { 
and since E (Xy) — /i ( this gives 

. (26 ' 5) 

For this to be generally true we must have 

. • • ■ <“•*> 

a first condition on the A’s. If Ay is the mean of Ay in the ith set we have 

. (267) 

Now consider (25.4). The variance of £ is the sum of k variances, for the samples from 
sub-populations are independent. Consider then the variance of 2 Ay Xy, remembering 
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that the population of N { members is finite. We have 
variance = E E {A y {x i} — p t ) } s 

= 2 a i + ^ Kj Kk ( x a — Pi) ( x ik ~ Pi) }» j 

1 j.k 

- ***-$*'*•! fa 

= jy< ^ ^ __ ^ ^f/) 2 
iv7-1 -1 

= jyTZTi O 1 * W — n i) ^3. + % (hj ~~ ^.) 2 }* • • • (25.8) 

This is clearly minimised only if • 

Kj — ^i. = 0, . . . . . (25.9) 

that is, if all the A’s for any sub-population are equal. This is what we should expect on 
intuitive grounds, for there is no reason for weighting the sample members differently in 
the same sub-sample. 

Our minimal variance, say v , is then given from (25.8), by summing over i, as 

a\ (N i - n { ) 

V== Z- N { -l n *^ 

_l aUN t -n { )N} 

- XT * ^ 


N*' 

1 


1 

olNl 


n. 


-1T‘ f(2f ( '-.l')» 1 + “ n8tent - 

This is a minimum for variations in n i subject to E n i — n if 

d 


( 25 . 10 ) 


dn. 


(v — p E n t ) = 0, 


where p is an undetermined constant. This yields almost at once 


oc 


N t - 1 


( 25 . 11 ) 


25.9. If we know the population variances a\ and the numbers N t this equation 
determines the numbers n i ; but in practice it is rather unlikely that we should know the 
variances without knowing the means, in which case we should not have to sample to find 
the mean of the whole population. Our result is not, however, useless. In the first place 
we find for the estimator t 



— 2 ••••. ( 25 . 12 ) 

so that the estimate is a weighted average of the sample means, the weights b^ing propor¬ 
tional to the population numbers not to the numbers n t . Secondly, without knowing 
the variances o\ exactly, we may sometimes reach approximations from prior knowledge 
of the populations. Such values, without giving absolute accuracy, will at least represent 
improvements on selecting the n ’s by chance. 
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25.10. If the numbers N { are effectively infinite the formulae simplify, and, for 
instance, instead of (25.11) we have 

n { oc Nt, ...... (25.13) 

the sample number varying with the standard deviation in the stratum concerned, as well 
as its number of members. 

25.11. If there is no information available at all about the variances a\ the most 
reasonable course in applying (25.11) appears to be to suppose them all equal. In such 
a case, for large N t we have 

* Ut oc N {9 . . . . . . (25.14) 

or the sampling numbers are proportional to the population numbers. This is what we 
might expect on intuitive grounds. If the populations are infinite the w/s are equal, which 
again is in accordance with intuitive ideas. 

25.12. The above will serve as an illustration of the way in which theoretical require¬ 
ments can influence the scope of an inquiry conducted among an existent population. By 
seeking for an estimator with minimum variance we have been led to expressions deter¬ 
mining the allocation of sample numbers among the different strata—and incidentally, of 
course, we have derived expressions for the minimum variance, so that the maximum 
possible precision can be ascertained. The fact that some of our results depend on unknown 
constants suggests that in some circumstances it may be worth while conducting a pre¬ 
liminary or “ pilot ” inquiry in order to estimate the unknowns and hence to improve the 
precision of the main inquiry which is to follow. The possibilities of such pilot surveys 
have yet to be explored, but the technique appears to merit serious investigation. 

25.13. In passing, we may mention one other topic of great practical importance on 
which theory can throw a good deal of light, that of optimum size of a sampling unit. In 
sampling a human population of a town, for instance, need we take individuals as our 
units ? It would be easier to sample households, or streets, or even whole districts ; but 
do we lose anything by this method, and if so, how much ? Furthermore, the grouping of 
individuals into units of larger size sometimes has a peculiar effect on correlations which 
may lead to erroneous conclusions, and a theoretical investigation may be required to safe¬ 
guard against error. We shall not pursue the subject further here—the sampling problem 
would require a book in itself—but the reader who is interested may like to consult some 
of the papers referred to at the end of the chapter. 

The Design of Experiments 

25.14. For an existent population the flexibility of sampling technique is somewhat 
limited. We are given an aggregate of values, some of which are to be extracted for scrutiny, 
and no manipulation of the sampling can tell us more than exists, so to speak, already 
inscribed upon the population itself. Consequently the main line of endeavour in such 
cases lies in estimating with the greatest accuracy (which is largely a matter of choosing 
the right statistics and minimising sampling variability), or in ensuring that sufficient 
material is available to enable the requisite comparisons to be made with significance 
(which is.largely a matter of sample size and selecting the most suitable tests of significance^. 
Nothing can alter the population, and theory will, as a rule, only react upon the sampling 
process by some such method as has already been exemplified, e.g. in dictating that the 
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'sampling must be random, in stratifying the population before the sampling is carried out, 
and in deciding how limited resources can be expended to the best advantage. 

25.15. For hypothetical populations there are often wider possibilities, for the nature 
of the inquiry may itself determine which populations are to be studied, and the populations 
may, to a certain extent, be set up at will. For instance, if we are interested in an inquiry 
into the relationship between income and size of family in the United Kingdom, the popula¬ 
tion already exists and we cannot go outside it; whereas if we wish to discuss the effect 
of a poison on bacterial growth or of a fertiliser on the yield of barley we can not only 
reproduce experimental data ad libitum but can arrange the inquiry so as to confine it to 
certain populations (e.g., by considering only a given type of bacterium in fixed nutritional 
circumstances or at fixed temperatures), or we may extend the domain of consideration as 
far as purely practical limitations will allow (e.g., by growing barley in new surroundings 
or in new climates). ^This is rather a pretentious way of saying that we may experiment 
in a domain which, within limits, can be assigned at will.) The statistician has a much 
greater scope for ingenuity in the design of experiments tnan in the design of sampling 
inquiries on existent populations because of the greater degree of control over the population 
under examination. 

25.16. In the classical ideal experiment, only the factors under consideration were 
allowed to vary, other conditions being kept as constant as laboratory practice would allow 
—in investigations concerning the relation between resistance and current in an electric 
circuit, for instance, attempts would be made to keep factors such as temperature and 
external magnetic effects strictly constant. It would be recognized that there would be 
residual errors which would affect the exactitude of the results, but these would be measur¬ 
able on certain assumptions. 

25.17. Statistical theory can, of course, deal with such cases, but it can also go farther 
and often wishes to do so. In the first place, it frankly admits the existence not only of 
experimental error (in the sense of aberration from a “ true ” value) but of the much wider 
type of variation which gives rise to frequency-distributions in practice. Instead of isolating 
particular factors for study, it may wish to give full play to the disturbances which arise 
in practice in order to investigate what happens in “ natural 55 conditions. ^For this reason, 
statistical experiments are often complex in the sense that a number of factors are allowed 
to vary simultaneously. / 

^Secondly, the admission of outside influences which together make up what is generally 
called experimental error implies that it should be possible to estimate the extent of such 
error from the data themselvesj We wish to obtain, not the functional relations between 
variables which may only exist under artificial conditions^ but the stochastic relations 
observed in practice. 

25.18. The effect of this on experimental design is that the hypothetical population 
we consider is often a rather general one. Taking the case of trials of a new variety of 
barley as an example, we should wish to compare its yields with those of other varieties 
in different soil conditions, with different manurial treatments, in different years (so as to 
get variations, in climate), and so on. (Furthermore, to obtain estimates of the error due 
to other factors we usually have to replicate the experiment.! A great number of inter- 
comparisons fall to be made, and the process of design is essentially that of finding a form 
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of experiment which will permit all these comparisons and yet save as much unnecessary 
labour as possible. 

Orthogonality 

25.19. To reduce the discussion to more concrete terms we will consider the testing 
of a new variety of barley. In order to study its behaviour under different soil conditions 
we will select a number of areas in which barley is grown and choose a block of ground in 
each. This will give us inter-soil comparisons. We will also arrange to carry the experi¬ 
ment on for a period of years, so that climatic variations may also be compared. The 
other factor in which we are interested is the response to certain manures, which we will 
take to be dung (D), potash ( K ), nitrogen (N), and phosphates (P). 

Consider any block at any one place in any year. (We will decide on certain standard 
quantities of the four manures and assume that for any manure either a dressing of this 
standard amount is to be given, or it is to be withheld.^ This simplifies the experiment, 
for then every manure either is or is not applied, and our results can be classified by simple 
dichotomies. Of course more complicated experiments can be devised to allow for different 
quantities of fertiliser, but the simpler case will be sufficient for our purposes. 

We have then set up a population which can be classified according to six qualities, 
place, time, and the application of four manures. Our results are intended to show whether 
there is any variation in yield between these conditions and various combinations of them. 
Of course, it does not follow in deductive logic that if there is significant variation from year 
to year in the particular years chosen there will always be temporal or climatic variation ; 
and similarly, if there is significant variation from place to place it does not follow that 
other soil conditions which have not been tested will show a significant variation. To 
arrive at such conclusions we have to perform an ordinary generalisation by induction. 
What we shall say, if significant results appear, is that in the regions tested, or for the years 
tested, there were significant variations, and that it therefore appears likely that soil and 
climate exert a material effect on yield—and we shall maintain this with more or less con¬ 
fidence according as our experience is wider or narrower. This is the familiar inductive 
inference which forms the basis of all scientific inquiry. 

25.20. Within any one block we shall wish to study the effect of manurial treatments 
not only separately but in combination. We therefore divide the block into sixteen com¬ 
partments and treat them, respectively, with no manure, D, K t N, P, DK , DN f DP , KN, 
KPy NP 9 KNP, DNPy DKPy DKN and DKNP. Here every possible combination appears 
once and only once. To compare, for instance, the mean yields in the presence or absence 
of dung we add all the eight yields for plots on which no dung was spread and compare it 
with the sum of the other eight. All the necessary comparisons can be made. 

^Data of this kind are said to be orthogonal^ Each possibility arises an equal number of 
times. The reason for the use of the word is tnat such material is orthogonal in the sense 
we have considered in the analysis of variance. We saw in Chapters 23 and 24 that where 
cell-frequencies were equal the analysis was greatly simplified, and that und^r the custom¬ 
ary hypotheses the estimates of means were independent, j It is not, of course, Absolutely 
necessary to have orthogonal data—in fact, we have shown in Chapter 24 how to deal with 
the non-orthogonal case; but it is evidently a great convenience to be able to arrange 
for orthogonality, and no efficiency is lost by doing so. | 
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Replication 

25.21. *If)as suggested above, we divide e*ch block into 16 plots and treat each differ¬ 
ently, the analysis of variance of any block will have 15 degrees of freedom ; and if we 
cannot ignore any of the interactions there will be no residual variance due to “ error ”, 
that is to say we cannot estimate the reliability of our comparisons. All the 15 possible 
independent comparisons may be made, but we cannot decide whether differences are 
significant in the sense that they may be due to the other factors which we have agreed 
to allow to bear on the experiment, such as individual soil differences from plot to plot. 
If we are to estimate such “ error ” we must give the factors which produce it an oppor¬ 
tunity of varying. This may be done by replicating the experiment, that is to say, by 
repeating it in the same form. For instance, suppose that we set up four blocks and divide 
each into 16 plots, applying our manurial treatments to each block. (.Then, assuming that 
there are no significant interactions between blocks and tr eatments (a matter which we 
can test by examining the interaction terms in the variance-analysis), we shall have 63 
degrees of freedom, of which 15 are assignable to treatments and their interactions and the 
remaining 48 to a “ residual ” term, the latter providing an estimate of experimental errorr 
We have exemplified this process in Chapter 23. 


Randomisation 

25.22. v ^Jp to this point we have said nothing about the arrangement of our 16 plots 
within the block. Suppose we divide our block into plots of equal size. Is there any 
advantage in allocating the treatments systematically, or is it preferable to assign them 
at random ? 

We shall consider the relative merits of random and systematic arrangements in more 
detail below, but we can announce the general rule now : /unless there is some good reason 
to the contrary, it is better to allot the treatments at random. Where possible, chance 
should be given full play/ 

25.23. frhe justification for this rule in our present instance can be seen by reference 
to the section on randomised blocks in 23.41. We saw there that by randomising the 
allocation of plots we were able to preserve the ^-distribution and hence to validate our 
tests of significance, even where normality in the parent form was not assumed. / The 
process is essentially one of extending our hypothetical population. Instead of considering 
the observed yields as specimens of what might happen in repeated trials of the same variety 
of barley if the same manurial treatments were applied to the same plots, we consider the 
possible yields in repeated trials if the manurial treatments were applied in all possible 
ways to different plots. J Our experiment is systematic in the sense that we prescribe a 
different treatment for each plot; it is random to the extent that we allot the treatments 
to plots by chancy 

25.24. There is one source of possible confusion here which it is desirable to remove. 
In our agricultural example complications arise because of the physical contiguity of the 
plotsjand we shall see below that it is often desirable to eliminate by special designs system¬ 
atic fertility gradients in the soil. ) In other classes of experiment where we desire orthogon¬ 
ality, the members need not be subject to this kind of effect, and often are not. Reverting 
to the example of raw versus pasteurised milk which has already been mentioned, suppose 
we take a simplified case and wish to measure whether the two different milks have different 
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effects on boys and girls. With a class of 40 children, 20 boys and 20 girls, we can proceed 
in several ways. It is obviously useless jfco^give raw milk to all the boys and pasteurised 
milk to all the girls, for then we have no measure of the differential effect, if any, for either 
sex alone. 'We might toss up in each case and allot raw or pasteurised milk to each child 
by chance ; but this would probably make the data non-orthogonal J To attain orthogon¬ 
ality, we should allot 10 children to each of the four sub-groups BP> OP , BR, OR (where 
B = boy, O = girl, P = pasteurised, R = raw). We then have an analysis of variance— 


Degrees of freedom 

Between sexes ......... 1 

Between milks ......... 1 

Residual (including interactions) . . . . .37 

Total . . . . . . *. .39 


This is analogous to a test of a cereal with two fertilisers and 10 replications. 

The question is, how should we allot the children to the four groups ? Their sex, of 
course, is determined, but the nature of the milk they receive is at choice. It is here that 
the randomisation will help. The ten children of a specified sex who receive raw milk 
should be chosen at random from the 20 available. In this instance it might be thought 
that any method would do ; but it is best to avoid the risk of bias, fli the children were 
chosen by the teacher he might tend to select the 10 bigger boys or the 10 brighter boys. 
If they were chosen alphabetically, we might get brothers and sisters automatically receiv¬ 
ing the same treatment; and so on. The randomisation process avoids all systematic 
effects of this kind and brings us a stage nearer to obtaining an unbiassed answer to our 

Sensitivity of a Test 

25.25. In some cases, where the variate is discontinuous, the nature of the test of 
significance which we propose to apply may make a difference to the form of the experiment. 
If we are testing a certain hypothesis which can produce a specified number m of experi¬ 
mental results which are acceptable as conforming to the hypothesis, whereas other 
hypotheses produce a number n of other results, we clearly want to keep m as small as 
possible compared with n. The ideal case, of course, is that of the “ crucial ” experiment 
in which the hypothesis can only give one result and other hypotheses give a different 
result. The result then proves or disproves the truth of the hypothesis, and no test of 
significance arises. In statistical practice we do not as a general rule perform crucial 
experiments, but we can sometimes design an experiment so that it is more crucial, if the 
expression be allowed, than alternative methods. 

25.26. Consider, for instance, the case of a cashier who claims to be able to detect 

good money from false at a glance. To test this ability we spread ten coins before him, 
tell him that p are good, and ask him to point them out. What number of good coins p 
should we include among the ten ? 1 

If the cashier had no power of discrimination and there are p good coins, the proba¬ 
bility that he would guess right by chance is 

10 \ 
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for the total number of ways of selecting p from 10 is the denominator of this fraction and 
only one of them is right. Now we want to choose p so as to minimise the probability of 

such an event, i.e. so as to maximise ^ ^ This is clearly done when p = 5, so that we 

ought to have five good and five bad coins in the set. Any other number would increase 
the probability that he might be right by chance and hence decrease the sensitivity of the 
experiment. * 


Latin Squares 

25.27. We now proceed to consider a different type of design, which has been freely 
applied in agriculture but may also be applied to other forms of inquiry. Suppose we 
have a variety of barley to test and five different treatments to apply. We will assume 
that replication has been considered necessary and will replicate five times, the same number 
as the treatments. We will then divide our block into 25 plots like a chessboard (though 
the plots may be rectangular and need not be exact squares, provided they are all the same 
size). Each row may be considered a replication of the five treatments, and this itself 
involves the appearance of each treatment once and only once in each row. Can we extend 
the arrangement and ensure that in addition the treatments will occur just once in each 
column ? 

The answer is affirmative, as the following example shows :— 


A 

B 

C 

D 

E 

B 

C 

A 

E 

D 

C 

E 

D 

A 

B 

D 

A 

E 

B 

C 

E 

D 

B 

C 

A 


. (25.15) 


An arrangement of this kind is called a “ Latin square ”. It was studied extensively by 
Euler in the eighteenth century, though not of course from the statistical viewpoint. 


$ 

25.28. The advantage of this arrangement lies in the fact that it eliminates possible 
correlational effects due to fertility gradients in the soil or accidental circumstances which 
may exercise a “ patchy ” influence on the whole block. If we could be sure that there 
were no such influences at work, and that the soil was entirely homogeneous in the block, 
it would not matter where the treatments were placed; but by imposing the restriction 
that no treatment appears more than once in the same row or column we remove at least 
horizontal and vertical gradients from our comparisons. Suppose in fact that there were 
gradients r unning across the block and down it. When we work out the mean yield of the 
treatment A we shall add together five values, one of each in the various rows and columns. 
Similarly for B, so that a comparison of A and B is not affected by the systematic influences, 
which work equally on both. 

It is not, of course, true that the Latin square arrangement eliminates every effect due 
to soil heterogeneity. There might be systematic effects running diagonally which might 
still remain. It is, however, dear that in removing the effects in two perpendicular direc¬ 
tions we have substantially improved the comparison of mean yields as compared with 
a systematic arrangement. 


a.s.— vox,, n. 


s 
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25.29. The analysis of variance of a p x p Latin square may be carried out in the 
following form:— 

Sum of aquarea d.f. 

Between rows . . p — 1 

Between columns. . p — 1 

Between treatments ... p — 1 

Residual . . . (p — 1) (p — 2) 

Total . . . p* — 1 . . . (25.16) 

and the four constituent sums are, on the hypothesis of homogeneity, distributed as v% % 
independently. Before proving this result we will consider an example. 

Example 25.1 (from Thomson, Brit. J. Educ. Psych., 1941, 11, 135 ; data by S. D. Nisbet). 

A set of children were divided into four equal groups and each group was given four 
lists of words to test spelling ability. Each list formed one of four different types of test 
which we denote by A, B, C, D. The arrangement of the experiment is shown in the 
following table, together with the total scores of the corresponding groups :— 


Groups of ohildren 



1 

! 

2 

3 

4 

. 

Totals 



A 

B 

C 

D 

* 


1 

81 

41 

44 

53 

219 



1 D 

A 

B 

C 

! 


2 

38 

97 

42 

49 

226 

Lists of 
words 

3 

i O \ 

! 31 | 

D 

43 

A 

07 

B 

30 

177 



! B 

C 

D 

A 



4 

! 57 

i 

33 

43 

81 

214 


Totals 

i 

207 

214 

190 

! 219 

! 1 

_ j 

830 ; 


For instance, the first group of children had the first list of test A, the second of test 
D, and so on. No group had the same lists as another group, and each list was used exactly 
once. The scores (corresponding to yields in the agricultural case) were in fact the number 
of words spelled wrongly in a prior test but correctly in this test. 

The above table, of course, does not represent anything corresponding to the physical 
layout of an agricultural experiment, but it shows how a similar object can be secured to 
the avoidance of contiguous effects. Since it is possible that some relationship may exist 
between the lists of words and the tests (e.g. by accident one list might be particularly 
unsuitable for a test), we wish to ensure that not only will each group of children have 
each of the four tests, but that no list shall be given more than once and every list at least 
once. This is precisely what the Latin square accomplishes. The fact .that the diagonal 
arrangement of the letters is systematic does not affect the present inquiry, though in an 
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agricultural experiment a systematic diagonal fertility gradient might affect comparisons 
between treatments. , 

An analysis of variance on the usual lines gives the following results :— 


... 

. - 


- . - . 

Sum of Squares. 


d.f. 

i 

Quotient. 

Lists (rows). 

359-5 

3 ■ 

119-83 

Groups (columns) .... 

74-5 

! 3 

24-83 • 

Tests (treatments) .... 

4626-5 

i 3 I 

1542 17 

Residual. 

606-5 

; 6 

i ’ 

101-08 

j 

Totals. 

! 

5667-0 

1 

15 | 

* 


The differences between lists are evidently not-significant, from which we should conclude 
that they appear to be on a par so far as these tests are concerned. The quotient due to 
groups indicates that the children are more alike than chance would lead us to expect, but 
not significantly so, for the variance ratio 101 08/24-83 = 4-1, v x — 6, v 2 = 3, is not signifi¬ 
cant. On the other hand, the quotient due to tests is very significant, the ratio 
1542-17/101-08 15-3, v x = 3, v 2 = 6 being beyond the 1-per-cent, point. We conclude 

that there do exist differences between the tests. 

Construction of Latin Squares 

25.30. The numbers of possible Latin squares of order p is very large for high values 
of p . There are, for example, 576 squares of order 4 ; 161,280 squares of order 5 ; 373,248,000 
of order 6 and 61,428,210,278,400 of order 7. Up to this order they have been enumerated. 
Although many examples of squares of higher orders are known, the problem of enumeration 
for p > 8 awaits solution. Details and examples will be found in Fisher and Yates* 
Statistical Tables. 

By interchanging rows and columns the square can always be brought to a form in 
which the top row and left-hand column are in the order ABC, etc. It is then said to be 
a “ standard square **. For instance, there are four standard squares of the fourth order :— 

A B C D A B C D A B C D ABC!) 

B A D C B C D A B D A C B A D C 

C D B A C D A B C A D B C D A B ' * 17 ' 

D C A B D A B C D C B A D C B A 

From each of these, 144 ( = 4! 3 !) squares may be derived by permuting all columns and 
all rows except the first. (There is no point in permuting the first row, because the result 
would be a repetition of squares already obtained with an interchange of the letters 
A . . D, not an essentially different layout.) The total number of squares, as stated 
above, is therefore 4 x 144 — 576. 

It is only necessary to specify the standard squares. To select a Latin square at 
random we choose a standard form at random and then permute rows and columns at 
random, the randomising process being most conveniently carried out by Sampling 
Numbers. For squares of order 8 or more, where the standard types have not been enumer¬ 
ated, we can offly choose one of those which has, and hence select one at random from a 
restricted set of all possible squares. 
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Analysis of Variance for Latin Squares 

35.31 . We must now justify otir assertion that the Latin square may be analysed 
in the form (25.16), and that the z-test applies to the variance ratios which arise in the 
analysis. . ' 

For an ordinary two-way classification we have 

Z (*jk ~ *..)* = Z (*V. - *..) 8 + Z (*.* - *J S + Z (z Jk - x i% - * fc + *.)*• 

Thus, if x r is the mean of rows and x c that of columns in the Latin square, we have, writing 
x for x tm9 

z ( x rc ~~ x) 2 = Z (x r - X) 2 + Z (x c - X ) 2 + 2 ( x rc ~ x r ~ x c + x )* • (25.18) 

and the three parts on the right are distributed independently as v% 2 with p — 1, p ' — 1 and 
(jp — 1) (jp — 1) degrees of freedom respectively. 

Now 


Z (x n - x r - x c + x) 2 » Z (x t - x) 2 + Z (x^ -x r — x c -x t + 2x ) 2 

+ 2Z (x t - x) (x^ - x r — x c - x t + 2x) . . (25.19) 

where x t is the mean of treatments. 

Consider the cross-product term in (25.19). The summation takes place over all p 2 
values in the Latin square. Let us confine our attention to the summation for some par¬ 
ticular treatment. For this summation the factor x t — x is constant. Summation for 
the other factor gives 

Z (x^ — x r — x c — x t + 2x) = px t — Z x r — Z x c — px t + 2 px . (25.20) 

and since one treatment occurs in each row and column, 


Z x r * px \ 

Zx c = px] 

and hence the sum (25.20) vanishes. 

Thus the cross-product in (25.19) vanishes also and we have 


(25.21) 


Z(Xrc - *)* x ) 2 + Z(x c - x )« +Z(x t - x ) 2 

+ Z {x^ - x r - x c - x t + 2x) 2 . . . (25.22) 

This gives us the analysis of the sums of squares, and it only remains to show that the third 
term on the right in (25.22) is independent of the fourth. It will then follow that the four 
terms are distributed independently with p — l,jp — l,p — 1 and (p — 1) (p — 2) degrees 
of freedom. 

The required property of independence can be established directly, but it also follows 
from considerations of symmetry in the Latin square which have an interest of their own. 
We have regarded the square as composed of rows and columns, with treatments allotted 
in a certain way ; but by rearrangement we can equally well regard it as composed of rows 
and treatments with columns allocated in a certain way. For instance, if we take the 
first standard square in (25.17) w© may write it :— 


Treatment: 

A B C D 
Rows: i Ci C t <7, C € 

2 C t C 1 C< C 9 

3 C 4 (7, Ci C t 

4 Cj C 4 C% C\ 

where, for instance, treatment A occurs in row 1, column 1 (< C t ), row 2, column 2 (C % ) 9 and 
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bo on. This, of course, is not a physical layout, but that is immaterial for present purposes. 
It follows that since the sum of squares between columns is independent of the residual in 
(26.22), so also is that between treatments. 

The variance analysis then takes the form 


Sum of Squares. . j 

Rows. 

£ (ay - £)* ! 

Columns .... 

£ (ay - *)» 1 

Treatments . 

£(x t - £)« | 

Residual .... 

£ fare — x r — x c — xt + 2x) % ! 

! 

Totals 

1 

£ (a;*. - «)* 


d.f. 


P — 1 

p - 1 
p - 1 

(p - 1) (p - 2) 


P* — 1 


( 25 . 23 ) 


25.32. The above form provides a homogeneity test of the usual kind. If the test 
proves significant of heterogeneity we may, in the usual way, consider the hypothesis that 

x rc ~ a r Ct Cfc • • • • • ( 25 . 24 ) 

where £ rc is normally distributed about zero mean. We leave it to the reader to show, as 
in Chapter 23 , that in such an event the residual mean square is an unbiassed estimate of 
the variance of £ with (p — 1) (p — 2) degrees of freedom. 


25.33. As in the case of randomised blocks, it appears that under certain general 
conditions the z-distribution is reproduced approximately for fixed values which are per¬ 
muted in all the permissible ways consistent with the Latin square design. We omit an 
investigation into this result (for which see Welch, 1937) as the algebra is considerably 
more complicated than for randomised blocks. The result has been confirmed by a limited 
number of experiments. 


Graeco-Latin and Orthogonal Square#. 

25.34. If the two squares 


A 

B 

c 

D 

A 

B 

C 

D 

B 

A 

D 

C 

C 

D 

A 

B 

C 

D 

A 

B 

D 

C 

B 

A 

D 

C 

B 

A 

B 

A 

D 

C 


are superposed we have the arrangement— 

AA BB CC DD 
BC AD DA CB 
CD DC AB BA 
DB CA BD AC 


( 25 . 25 ) 


( 25 . 26 ) 


in which every possible pair of letters (X Y being regarded as different from YX) appears 
just once. Such a pair of squares is said to be orthogonal. The form (25.26) is sometimes 
written with Greek letters instead of the second Roman set; hence the name of Graeco- 
Latin square. It is also possible to superpose a third factor which we will denote by the 
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A «1 

Bp 2 

C y 3 

D6 4 

By 4 

Ad 3 

D a2 

C p 1 

C 62 

D y 1 

Ap 4 

B a3 

Dp3 

C a4 

B61 

Ay 2 


.numerals 1-4 in such a way that each combination of any pair of types occurs just 
once, e.g. 


(25.27) 


Complete sets of orthogonal squares (i.e. those in which there are p — 1 factors for a p x p 
square) are known for all prime p and for p = 4, 8 and 9. Curiously, there is no set for 
p = 6. Up to and including p = 7 they have been enumerated. 

We shall not enter here into the use of these squares in experimental design. They 
are generalisations of the Latin square in which, by suitable arrangements, several factors 
can be tried out simultaneously, so that all possible combinations of pairs occur an equal 
number of times. 


Confounding 

25.35. It will be evident that if we wish to consider in full a classification according 
to several variates, particularly with replications, the number of individual members in 
the sample may be very large. For instance, if we wish to test a variety of barley with 
three different applications of four types of fertiliser, there must be 81 yields even without 
replication, if we want to make all the comparisons possible. Physical considerations may 
make a layout of an experiment on such a scale impossible. The difficulty is possibly more 
serious in experiments on expensive animals such as cows. 

Where economy in the size of sample is a very material factor we may be able to reduce 
the sample at the expense of sacrificing some of the less important comparisons. For 
example, to consider once again the case of barley and the effect of fertilisers : we shall 
undoubtedly wish to compare yields of D and not-D, K and not-K, P and not-P, N and 
not-N. We may also wish to compare first-order interactions of the type DK and not-D, K. 
But it is quite possible that interactions of higher order, such as the effect of dung in the 
presence of two other fertilisers, are negligible. Where we are prepared to assume that this 
is so, on the basis of prior evidence or otherwise, we can dispense with certain information 
and still make the comparisons we wish while retaining properties of orthogonality/ 

25.36. Consider, as an illustration, an experiment with three fertilisers, each of which 
is applied or not applied, say N , P and K, and four replications. In the ordinary way 
there would be 32 plots and we should have an analysis of variance as follows, assuming 
that blodk-treatment interactions may be regarded as part of the residual:— 


Sum of squares. d.f. 

Blocks ...... 3 

N .1 

P .1 

K . .1 

NP .1 

NK .1 

PK ..1 

NPK .1 

Residual.21 

Total ...... 31 
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Now suppose that we divide our main blocks into two sub-blocks, the first containing 
the treatments 

O (None), NP, NK, PK .(25.28) 

and the second the treatments 

N, P, K, NPK .(25.29) 

We may then analyse the variance as follows, regarding the sub-blocks as blocks of four 
plots each:— 


Sum of squares d.f. 

Blocks ...... 7 

N . . . . . .1 

P . 1 

A' .1 

NP .1 

NK .1 

PK .1 

Residual ...... 18 

Total .31 


In fact, if we wish to compare the yields with N and those without N, i.e. 

N + NPK +NP + NK 

with 0 -f- PK -f • P *f A, 

it will be seen that we add two members from (25.28) and two from (25.29), so the difference 
is not affected by block differences; and similarly for the other comparisons. Such a 
design is said to be balanced, and the interaction NKP is confounded with block-differences, 
since in the eight blocks it cannot now be isolated from block effects. The advantage of 
the second design over the first is that, without losing anything appreciable in comparisons 
between treatments, we have gained a good deal in the assessment of block effects ; for the 
residual has only declined from 21 to 18 d.f. whereas the sum of squares between blocks 
has increased from 3 to 7 d.f. 


25.37. The ideas of orthogonality, randomisation, balance and confounding have 
been developed to an advanced degree and with great ingenuity, particularly by Fisher 
and Yates. The slight sketch we have given of the methods in this chapter is intended to 
be no more than illustrative of the way in which the theory of experimental design is capable 
of development, at least in certain fields, and the manner in which efficiency may be imported 
into a practical inquiry by a due regard to theoretical requirements of the design. For a 
comprehensive account of this branch of the subject the reader should consult Fisher’s 
Statistical Methods and Design of Experiments, Yates (19376), and a useful introductory 
account by Goulden (1939). At this point we leave these particular topics and return to 
certain general matters. 

Design and Randomisation 

25.38. Whenever an inference is to be made, and particularly where hypothetical 
populations are concerned, the reader will find it useful to ask himself what precisely is the 
population under consideration. We can illustrate the point very usefully by discussing 
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a subject on which there has recently been difference of authoritative opinion—that of 
occasional conflict between the requirements of balancing and randomisation. 

m 

25.39. Consider in the first place the testing of a cereal under two treatments, denoted 
by A and B ; and to simplify matters as much as possible, suppose we are to sow eight 
plots in a straight- line. In what order shall we allot the treatments ? 

If the plots are not too large so that the row covers a big area, it is quite possible that 
there may be a trend of fertility in the soil itself which will affect yields differentially and 
hence interfere with comparisons which we might make. Suppose that we do wish to 
guard against a fertility gradient so far as possible. We might then decide on one of the 
“ balanced ” arrangements : 

AABBBBAA .(26.30) 

ABB A ABB A .(26.31) 

AB ABB ABA .(26.32) 

As will be easily seen, if there is a linear gradient in fertility along the row the means of 
A and B treatments respectively will be affected to the same extent and hence their differ¬ 
ence unaffected. For instance, consider (26.30) and suppose the linear gradient is repre¬ 
sented by an additive factor q + kp, k — 1 ... 8. On the hypothesis that the remain¬ 
ing effect consists of a constant a for A-treatments with a normal residual £, and similarly 
for B, the yields are 

A-treatments : q + p + a + q + 2p + a + f *, q + Ip + a + q + Sp + a + 

^-treatments : q + 3p + b + q + 4p + b+£ t , q + 5p + b+l; t , q + Gp + b + £ a 

with means 

i (4? + l&p) + a + J (fi + + Si + f«) 

i (4? + I8p) + 6 + £(£* + £« + £» + £*) 

respectively. The differences of these two are independent of q and p. 

25.40. The alternative procedure in allotting treatments would be to distribute 
them at random. Such balanced arrangements as (26.30)-(26.32) might then arise by 
chance. But we might also get such an arrangement as 

AAAABBBB .(26.33) 

What are we to do in such circumstances ? If we reject this arrangement we are rejecting 
the random allocation of treatments in favour of systematisation. If we accept it we 
know quite well that a fertility gradient, if it exists, will invalidate the inquiry. 

The reader will no doubt agree that, if other things are equal, the balanced arrange¬ 
ment is better than'the arrangement (26.33). What we have to examine is whether other 
things are equal; in short, whether in rejecting randomisation we have lost anything 

useful in the testing of significance. 

* 

25.41. Consider a. rather more general case in which an experimental area is laid 
out in p blocks of q treatments each. If the subscript j refers to blocks and k to treat¬ 
ments, we have the usual analysis with sum of squares between blocks (p — 1 d.f.), between 
treatments (q — 1 d.f.), and residual ( (p — 1) (? — 1) d.f.). 

Now we have seen that if the individual plot-yield can be regarded as a block effect 
pltis treatment effect plus a normal residual with oonstant variance from plot to plot. 
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the significance of treatment effects can be judged from the z-test in the usual way by 
comparing sum of squares between treatments with the residual sum of squares. This 
is true whether treatments are allocated at random or not. 

But suppose we wish to adopt the alternative viewpoint of 23.41 and make the infer¬ 
ence in the set of values obtained by permuting the observed values. These permutations 
will not affect the block means or the total mean, and hence the sum of squares between 
blocks remains constant. The remaining part of the analysis may be written— 


Sum of Squares. 

d.f. 

Treatment 

= z(x.k - x..y 

q - 1 

Residual .... 

s, = 2 (Xjic - Xj. - x,k + as..)* 

(p - 1) (q - 1) 

Totals 

= Z (ayk - x } .y 

P (« - 1) 


Rather remarkably, the z-test holds for the ratio 

Si (P - 1) (? - 1) 

q- 1 S\ ’ 

provided that treatments are allocated at random, independently of the distribution of 
residual effects in individual plots. 

25.42. Consider, then, the population of values, (q !) P_1 in number, obtained by per¬ 
muting the observed values. The total sum of squares S 3 in (25.34) is the same for all 
members. Consequently if Si is too great, S t must be too small and vice-versa ; and in 
general, if we confine ourselves to certain layouts and reject others, all the possible values 
of S t cannot appear. It is this fact which has been seized on by advocates of randomisa¬ 
tion. They point out that for balanced layouts S 1 tends to be smaller than for random 
layouts (a conclusion supported by experiment); consequently that the test of significance 
is invalidated and the estimate of error 8 t too big. The difference between the two modes 
of thought may be expressed briefly in this way : with balanced layouts the real error is 
reduced but the estimate of error is too large, so that the significance of the result is more 
in doubt; whereas with random layouts the estimate of error is exact but the error itself 
may be larger. The question is whether one prefers to be nearer the truth without knowing 
how near, or farther from the truth with a knowledge of the limits of error. 

25.43. For details of the controversy on this topic the reader may consult the papers 
referred to at the end of the chapter. It brings into prominence an important question 
of inferanna which can only be decided by the experimenter himself. If he chooses to 
regard any act of experimentation as one of a large population of such acts, to be carried 
out by himself or other workers, he may prefer randomisation in all oiroumstances, not¬ 
withstanding that every now and again he will hit by chance on a design whioh he knows 
is likely to give ™itjlA«rling results. But if he cannot take this very detached attitude (and 
most experimenters, b eing human, would think it poor compensation that their own errors 
are hn.ln/nfWl by the better luck of other people) then he will prefer to design a balanoed 
layout, even if the exactitude of his tests of significance is impaired. 
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25.44. We must, however, not leave the reader with the impression that the 
desiderata of both schools of thought are totally incompatible. It frequently happens that 
one can select a design which is both balanced and random. The Latin square is a good 
example. By imposing the restriction that a treatment must not appear more than once 
in a row or column we remove to some extent the interference of fertility gradients ; by 
requiring that it shall appear just once we balance the design; and by leaving the rest 
of the layout to be determined by a random selection from all possible Latin squares of 
that order we randomise so as to reproduce the distribution of the variance ratio in the 
required form, thus, as “ Student ” remarked, “ conforming to all the principles of allowed 
witchcraft ”. 
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EXERCISES 

25.1. A population is given by specifying the frequencies in comparatively narrow 
ranges of one variate, the frequency in the ith range being N { and ranges being of equal 
width. Show that if the population frequencies are large, the best estimator of the mean 
of a second variate which is linearly related to the first (in the sense of the unbiassed estimator 
of minimum variance) in a sample obtained by taking n t members from the ith range is 
given when n t is proportional to N t . 

25.2. Extend the result of the previous exercise to the case where ranges are of 
unequal width. 

If the number of farms in England and Wales is known in the acreage ranges 0 - 49 , 
50 - 99 , 100 - 199 , 200 - 499 , 500 and over, what sampling proportions would you take in the 
various ranges to estimate the total acreage under wheat ? 
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25.3. If a variate | can be regarded as the sum of a systematic component f (as) and 
an uncorrelated random component e x and r/ similarly as rj (a?) -fr e lt and if the random 
components are uncorrelated with each other, show that 

r (I, n) =_cov{ g(a=), *) (*)} _ 

{ (var £ (x) + var e 4 ) (var q ( x) + var c»)} *' 

Hence, if a population is divided into strata the correlation between £ and tj for these strata 
will, in general, be less than that obtained by combining strata to obtain larger units; 
and as the strata are further subdivided the correlation between £ and rj tends to zero. 

(Spearman, 1907, Am. J. Psych., 18; Wold, 1938a.) 

25.4. Illustrate the effect of the foregoing exercise by calculating the correlation 
coefficients for the data of Table 14.4 (vol. I, p. 333), (a) by adding the variates in pairs 
and so obtaining 24 values; (6) by repeating the operation and obtaining 12 values; 
and (c) by repeating the operation and so obtaining 6 values. 

25.5. (Markoff’s theorem.) Consider a sample of n independent values x 1 . . . x n , 
x ( being drawn from a population IJ { with mean /i { and variance tr?. Suppose we have 
a function 0 defined by 

9 = £ b iPi 
j = 1 

where the 6’s are known and the parameters pj depend on the /t’s according to the equation 

n = s < n 

i-i 

the a’ s also being known. Then an unbiassed estimator of 0, say t, with minimum variance 
may be written— 

n 

1 = £ h x y 

Show that the function t is given by substituting for the p’s in the expression for 0 the 
functions q given by minimising 

with regard to the q's considered as independent variables. 

Show further that if this minimum value is S t the estimated variance of t is 

—r (A? a?). 

7h “ S 


25.6. In a feeding experiment there are given five different foods, each of which is 
available in four grades. It is desired to feed each animal with one grade of each food, 
but only one, so that a comparison may be made of the effect of the different grades of any 
particular food. Use the Graeco-Latin square to show how the feeding can be carried 
out. 
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. 25.7. A water dimmer is to be taken to ten spots and asked to say whether water 
is present below the sujffaoe. It is decided to choose fire spots where water is known for 
oertain to exist and &wL where it is known not to exist. The order in which the spots are 
to be presented is demmined by spinning a coin, heads denoting water and tails not-water. 

The spinning cm the coin results in the first five trials giving heads. Would you 
acoept this result or spin again ? 

25.8. Show tha\ a Latin square may be regarded as a three-way classification in 
which jp* members are mot zero, but p* — p* members vanish. Derive the analysis of 
variance for the Latin square from this approach and generalise it to the Graeco-Latin 
square. 



CHAPTER 26 

GENERAL THEORY OF SIGNIFICANCE-TESTS—(1) 

Hypotheses to be Considered 

26.1. The kind of hypothesis which we test in statistics is more restricted than the 
general scientific hypothesis. It is a scientific hypothesis that every particle of matter 
in the universe attracts every other particle, or that Homer was blind; but these are not 
hypotheses such as arise for testing from the statistical viewpoint. A review of the various . 
tests which have been introduced earlier in this book indicates that the great majority 
specify something about a population. Some merely assert a general fact such as “ the 
population is continuous ” or 4£ the population is rectangular ”. Others are more definite, 
as for instance “ the population is normal and has a mean ju 0 ” ; and again others are less 
definite in one direction and more definite in another, e.g. " the population has unit vari¬ 
ance ”. It is also usually a part of the hypothesis that the sample from which the inference 
is being made was obtained by a random process. 

26.2* Suppose we have a set of random variables x x . . . x n . In the sample space 
W of n dimensions the sample-point whose co-ordinates are x t ... x n determines a point 
E, say, with a distribution function which we may write as P (E). If w is any region in 
W, we may derive the probability that E falls in w, say P (E e w). Then we shall say that 
any hypothesis concerning the law P (E ew) is a statistical hvvothesis . If it-determines 
the law completely we shall call it simple . In the contrary case it is said to be composite . 

For instance, in testing the significance of the mean of a sample of n, it is a statistical 
hypothesis that the parent is normal. This is composite, as also is the hypothesis that 
the parent is normal with mean fi or the hypothesis that the parent is normal with variance 
<x 2 . The hypothesis that the parent is normal with mean ju and variance or 2 is simple because 
then the parent is fully determined. 

Example * 26.1 

In sampling from a population dichotomised into classes possessing the attributes 
A or not-A, say in proportion m and % (= 1 — to), the sampling distribution is the binomial 
(X + w) n . This is completely determined by the value of w, and hence a hypothesis as 
to the value of to is simple. Such, for instance, would be the hypothesis that male and 
female births occur in equal proportions. Similarly, in a multiple classification with pro¬ 
portions m l9 tUj, . . . uj 8 , a simple hypothesis would specify values for all the in’s ; if only 
one were specified and s were greater than two the hypothesis would be composite. 

In sampling from a bivariate normal population characterised by two means, two 
variances and a correlati on, a hypothesis about any one param eter would he con^gtc 
and similarly for a hypothesis concerning two, three or fomparsiro etars. Only if all five 
ware specified in addition to the normality of the' parent would the hypothesis be simple ; 
and this mntmtti standing the tact that the sampling distribution of the means is inde- 
pendent of the other three parameters, and that of the correlation coefficient independent 
of the othtir four. 
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26.3. A hypothesis which determines the law P (E ew) completely except for v 
parameters is sometimes said to have v degrees of freedom. Such a hypothesis may be 
regarded as an aggregate of simple hypotheses. For instance, the hypothesis that a popula¬ 
tion is normal with mean p is the aggregate, for all a 2 , of hypotheses that it is normal with 
mean p and variance a 2 . 

26.4. The kind of argument we have used in testing hypotheses, for both large and 

small samples, is of this character: assuming that the hypothesis is true, we can, with 
any assigned probability a, find a region to, in the sample space W such that the probability 
of E falling in TT-u>« is a. We call the region of acceptance and the complementary 

domain w t the critical region. (This is the nomenclature of Chapter 19.) If our observed 
E falls in"u^we reject the hypothesis ; if not we accept it. As a rule, in practical cases, 
our regions to, are determined by the values of some statistic such as x in testing the mean. 

Errors of First and Second Kind 

26.5. In general, as we saw in Chapter 19, there are many possible regions of accept¬ 
ance for any given hypothesis and any given probability level a. For all of them we shall 
err in proportion 1 — a of the cases in the long run by rejecting the hypothesis if E falls 
in the critical region— 'provided that the hypothesis is true. But what about the case when 
it is not true ? We cannot ignore this case, for its possible existence is the very reason for 
oarrying out the test. It is of no use whatever to know merely what the test will do when 
the hypothesis is true without regard to its behaviour in the contrary case ; for if we are 
to consider only the events which happen when the hypothesis is true we have no right to 
use a test based on that assumption to reject it. 

By having regard to the behaviour of the test when the hypothesis is not true we are 
able to lay down criteria for choosing among the various tests obeying the rule 

P{Eew 9 \H 9 ) = 1 -a,.(26.1) 

where H, is the hypothesis. In fact we shall seek for the test which, while obeying (26.1), 
minimises the risk of accepting H 0 when an alternative hypothesis H 1 is true and H 0 accord¬ 
ingly is false. That is to say, we shall endeavour to find w 0 such that, in addition to (26.1), 
we also have 

1 — P {E f w t \ H t } = minimum. .... (26.2) 

26.6. From a slightly different viewpoint we may say that there are two possible 
errors in judging a statistical hypothesis: 

(а) We may reject it when we ought to accept it, that is, when it is true. 

(б) We may accept it when we ought to reject it, that is, when it is false. 

These are known as errors of the first and second kind respectively. The error of the 
first kind we can control exactly by setting up the proper region of acceptance determined 
by a. Errors of the second kind cannot be controlled in this way, but we can sometimes 
calculate their probabilities, and in any case can try to reduce them to a minimum This 
is the fundamental idea, first given explicit expression by Neyman and E. S. Pearson, 
which determines most of the work in the present and succeeding chapters. 

$ 26,7. The possibility of finding regions of acceptance obeying (26.2) dearly depends 

oft ft precise specification of what alternative hypotheses are under consideration. We 
had better emphasise the importance of this point. It is customary to speak; and even. 
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in a loose kind of way, to think of testing a hypothesis without reference to alternatives. 
To take the case of testing for normality, we often say that the hypothesis under test is 
that the population is normal without specifying 1 what other form it might have. The 
reader may say that the alternative he has in mind is merely the negation of the hypothesis, 
namely that the population is not normal. But if so he will find it very difficult—in my 
own View impossible—to justify any of his tests on a logical basis. He will calculate certain 
statistics and accept the hypothesis if their values are consonant with the normal values ; 
but it will always be possible to find other populations for which the observed values are 
even oloser to expectation. If agreement between theoretical and observed values is the 
criterion he should reject normality in favour of these alternative hypotheses. It is not 
until he specifies his alternatives and considers errors of the second kind that some firm 
foundation for intuitive processes begins to appear. 

26.8. Perhaps it may help to clarify the fundamental concepts of the present approaoh 



if we ermaider a simple illustration where the hypothesis under test H n is simple and there 
is only one alternative H x which is also simple. In Fig. 26.1 we show diagrammatically the 
scatter of sample-points which would arise in samples of two, x x and x t , the cluster on the 
right tha t due to H 0 and the one on the left to H t . In practice, of course, the sampling 
distributions are more usually continuous, but the dots will indicate roughly the condensation 
of sample density round central values. 

In determining the oritical region we have to find an area in the (x x , x t ) plane such that 
its “ content ” is 1 — «. Two possible areas are shown, w 0 being the area to the left of 
the line PQ, and to' 0 the area between the lines AB and BC. In either case the proportion 
in the oritical regions of the frequency on hypothesis H t is 1 — a, and if we reject H t when¬ 
ever the sample-point.falls in to* (and similarly for w 0 ) we shall commit an error of the first 
kind in proportion 1 — a of the cases in the long run. 

Consider errors of the second kind. By using the region to* we should reject H,—and 
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therefore accept H x —every time the sample-point arose from H u that is to say in practically 
nil the cases where H x was true, since nearly all the sample-points arising from H x lie in 
to,. Errors of the second kind are therefore very rare. On the other hand, if we were to 
use w' 0 we should accept H 0 every time a sample-point arose from H x hut did not fall between 
-the lines AB and BC, that is to say fairly frequently. Clearly to, is the better critical 
Tegion and has a much smaller error of the second kind than w 0 . 

26.9. It is to be noted that the argument does not depend on the relative frequencies 
-of occurrence of the hypotheses H„ and H x . This is generally true. There is no concealed 
form of Bayes* postulate in this approach. 

26.10. When there are n variates and p unknown parameters the geometrical repre¬ 
sentation can be extended by imagining a sample-space W of n dimensions adjoined to 
_a parameter space of p dimensions. We cannot draw a pioture of such a case on a two- 
dimensional sheet of paper, but the geometrical imagery and terminology of the method 
are frequently useful. A graphical illustration of a two-dimensional sample-space and 
a one-dimensional parameter space has already been given in Fig. 19.3. 


The Power Function 


« 26.11. If for a simple hypothesis H t , (26.1) is true we define 

y P{Eew 0 \H x }=p(H x \u> a ) .... (26.3) 

as the power of the critical region w 0 with respeot to H x . Clearly the power is greatest 
when the probability of an error of the second kind is least. 

In the expression on the left of (26.3) we regard the probability that E falls in w, as 
dependent on H x , the hypothesis alternative to H 0 . In the expression on the right we have 
regard to the power of the test for H x as dependent on w t . 

If there exists a particular region w 0 with greater power than any other region obeying 
<26.1) we shall say that it is the best critical region, and the test based on it will be called 
the most powerful test. 


26.12. We proceed to consider in turn the following cases :— 

(a) H 0 simple; one alternative H x which is simple. 

(b) H 0 simple ; an alternative H x which is composite but can be regarded as an aggregate 

of simple alternatives. 

(c) H t and H x composite but expressible, as aggregates of simple h ypotheses. 


Simple Hypotheses: One Simple Alternative 


26.13. Suppose the parent population is continuous, so that the simultaneous dis- 
tribution of the n sample values x x . . . x n is continuous ; and let the frequency functions 
of the sample values on hypotheses H 0 and H x be p„ (x x . . . x n ) and p x (x x . . . x n ) respect¬ 
ively. Write dx for the element dx x . . . dx n . Then we have 

I p 0 dx — 1 — x . . . . . (26.4) 

-and wish to maximise, for variations in the domain w 9 , the integral 



. (26.5) 
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This is a problem in the Calculus of Variations and is equivalent to maximising uncon¬ 
ditionally the integral 

* ’ ■ ■ • * 26;6 ^ 

or, what is the same thing, to minimisin g 

I (Po — &pi) dx, ..... (20.7) 

J W a 

where k is a constant to be determined by (26.4). 

It is known that the condition for a stationary value of (26.7) is that, on the 
boundary of w 0 , 

Po — kp x = 0. ..... (26.8) 

If the solution is a minimum we have, inside w 0 , 

Po <kpx .(26.9) 

and outside w 0 , 

Po > kp x .(26.10) 

This solution to the problem is fairly obvious on general grounds. If V is a function which 
is sometimes positive and sometimes negative, with a line of demarcation where it is zero 

(as must exist in virtue of continuity), we clearly minimise J U dx by taking into the region 

w 0 all the points for which U is negative and no more. This gives us (26.9) and (26.10), 
and the boundary of w 0 is the locus for which U vanishes. By convention we regard the 
boundary as included in w 0 , which accounts for the equality in (26.9) and its absence in 
(26.10). 


26.14. The conditions expressed by (26.8), (26.9) and (26.10) are sufficient as well 
as necessary. For let w x be any other region for which 


I po dx -= 1 — a. 

J W x 

If w 0 and w x have a common part denote it by w 0x . Then 
. 1 p 0 dx — 1 — * — I Po dx 

Juj,-u'oi Jw $l 

= I Podx 

J Wx—W*\ 


and hence, from (26.9) 


jfc f pi dx > [ Po dx = | Podx 

J W,—Wot J Wo —Wot J W|—Wot 

> k j Pi dx. 

J W,-W, t 

Adding to both sides Jfc I p x dx, we have 

J ie,i 

k I pi dx > k I pi dx, . 

J to. J101 


( 20 . 11 ) 


and for positive Jfc, the power of w x is less than that of w, and the latter is the best 

critical region. 

a.s.—voju n. t 1 
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Both in this section and implicitly in the last we have required k to be positive. That 
it must be so if u? t is to exist emerges from (26.8), for p 0 and pi are essentially not negative, 
and if k were negative no solution for real variate-values would exist. 

Example 26.2 

Consider the normal population 

dF = —~r exp { — J (aj — /*)*} dx, — co <x < co. 

V(&i) 

Let the hypothesis H t be that ft = a 0 , and the alternative that ft — a t . We have— 


P 0 = exp j - £ y (Xj - a t ) a 

(2tz)2 l jZl 


(2ti)2 

We can conveniently express this in terms of the sample mean x and the sample variance 
s*, obtaining for the density function 

P « = -~exp [-?{(*-o„)» + 0®}]. 

(2n)2 L J 

A similar expression is found for p x and thus, for the boundaries of the best critical region, 
we have 

‘ Mp [-?( (S 

= ex P | (°o - «i)(2£ — o 0 - a») J. 

This yields for the critical region 


or 


(a 0 — «i) (2x — a 0 —«!) <- log k, 

n 


(o, - a t )x < i (o§ - of) + - log k — (a t - a x ) x 0 , say. 


n 


If o x < a« the region is then defined by 


x <*„ 

but if «x > a„ it is defined by 

x > x 0 . 

The reader should compare the two oases on a diagram similar to that of Fig. 26.1. 
Example 26.3 

Consider again the normal population when the mean is known, say zero, but the 
variance unknown, e.g.— 




— ao < a? <co. 



SIMPLE HYPOTHESES: FAMILIES OF SIMPLE ALTERNATIVES 275 


We now find, for hypotheses a = a 0 and a = a x 
which yields, for the best critical region, 


(** + - Of) < 


n 


“*{*(£)'} 


Thus our critical regions are defined by 

m l = -7T 2 4- .Q 2 


< V (al - o{), say. 


2 = x 2 + s 2 < v if a 1 < or 0 

m 2 = # 2 + s 2 > v if {Jx > <y 0 

The best critical regions in the space W are thus bounded by hyperspheres centred at the 
origin. Whether we take the space inside or the space outside a particular hypersphere 
as the critical region depends on the alternative hypothesis. The probabilities concerned 
can be evaluated directly without evaluating the constants k and v. In fact, the proba¬ 


bility of exceeding a given value of — 


nv n ( x 2 + s 2 ) 


— xl is obtainable from the ^-dis¬ 


tribution with n degrees of freedom, and hence the relation between v and a can be 
ascertained from the ^-integral. 

In this particular case we may find without difficulty the power of an alternative test 

which would suggest itself on intuitive grounds. Suppose we find v 2 - = xl from 

a 0 

the ^-distribution corresponding to n — 1 degrees of freedom and probability level a, 
and use, instead of the hyperspheres centred at the origin, those centred at the sample mean 

s 2 < v\ s 2 > v'. 

Suppose that the alternative H x is that a\ — 1-1 In testing H 0 for the alternative 
G\ > <r 0 we should, for the test based on v, find xl an d accept a 0 if 

nm 2 

For instance, with n = 5, 1 - a = 0 01 we find xl = 15*086. The probability of an error 
of the second kind is 

rx.vi-1 


<Xl 


f p^x = r 

J w % J0 


dF( X *), 
xl 


i.e. is obtained from the ^-integral with argument —j = 13*71, giving p (H t | w 0 ) = 0*018 


On the other hand, had we used Xi instead of xl we should have entered the table with 
four degrees of freedom, giving 13*277. Divided by 1*1 this gives 12*07, resulting in a 
probability of rather less than 0*017. This is the power of the second test and is lower 
than that of the first test, as of course it must be since the latter has maximum power. 

^ •- &s.£>| 

Simple Hypotheses : Families of Simple Alternatives ^ Of 

26-15. Consider now the case where H 0 is simple but H x is composite and consists 
of a family of simple alternatives. The most frequently occurring case is the one in which 
we have a class of simple hypotheses Q of which H 0 is one and H x comprises the remainder; 
for example, the hypothesis H 0 may be that a mean has some value /t 0 and the hypothesis 
H x that it has some other value unspecified. 
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vj** r 

- For each of these other values we may apply the foregoing results, and find for eaoh a 
corresponding to any particular member of H Xi say H i9 a best critical region w t . But this 
region in general will vary from one H t to another. We obviously cannot determine a 
different region for all the unspecified possibilities and are therefore led to inquire whether 
there exists, among the family of best critical regions w t9 one which is the best for all of 
them. Such a region is called the Uniformly Most Powerful and the test based on it the 
’ Uniformly Most Powerful test, conveniently shortened to U.M.P. test. 


26.16. Unfortunately, as we shall find below, the U.M.P. test does not usually 
exist unless we restrict our family S3 in certain ways. Consider, for instance, the case 
dealt with in Example 26.2. We found there that for a x < a 0 the best critical region for 
a simple alternative was defined by 

x<x Q . 

Now the boundaries of the regions determined by x = constant do not depend on a x and 
can be found directly from the sampling distribution of x when the probability level 1 — a 
is given. Consequently the regions defined by x < x 0 are the same for all a x < a 0 and hence 
the test is U.M.P. for the class of hypotheses that a x < a„. It is difficult to see how a better 
test could be devised, for, whatever a x subject to a 1 < a 0 , the test controls errors of the first 
kind and minimises those of the second. 

However, if a x > a 0 the best critical regions are defined by x > x 0 . Here again, if 
our class S3 is confined to the values of a x greater than a 0 the test is U.M.P. But if a x can 
be either greater or less than a 0 no U.M.P. test is possible. The reader will easily verify 
for himself that the same is true for the test considered in Example 26.3. 

26.17. We now show formally that for a simple hypothesis depending on 0„—the 
value taken by the parameter 0 defining a family of alternatives—no U.M.P. test exists 
for both positive and negative values of 0 — 0 O if the frequency function p (E ] 0) is con¬ 
tinuous, has everywhere a continuous derivative with respect to 0 which does not vanish 
identically, and admits of differentiation under the sign of integration over W. 

Suppose that such a test does exist. Then for any 0 we have, inside iv 0 

Po <kp, 

which we may write 

p{E\d)>h(6)p 0 (E\6 0 ). . . . : (26.12) 

Likewise, for any point E on the boundary of w Q we have 

p (E |’0) = h (0 )po (E | 0 O ).(26.13) 

By hypothesis p is differentiable in 0 and hence so is h . Moreover, as 0 -* 0 O , h (0) -> 1. 
Hence if 

A = 0 - 0 O 

and primes denote differentiation with respect to 0, we have 

ft(0) = l +A[h\^ 0 < g < I 

_ 1 1-3 pjE |0) 1 

L 3 0 pi (S | 0o) Je.+flA 

• • • 


. (26.14) 
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Farther we have 

p(E\6) =p 0 (E\6 t )+A[p'(E\0)] e>+rA 0 < r < 1. . 

Substituting in (26.12) from (26.14) and (26.16), we find 


{[p'(E\en 


_Po(E\ Oo) r , 


W (E\0)]e, +qA | >0 . 


(26.15) 


(26.16) 


This is true for any E and E and for all A, whatever its sign, and hence the expression in 
curly brackets vanishes. Thus we have 


b>'(E I ^ k “°' 


. (26.17) 


Similarly this equation may be shown to hold outside w 0 , and hence it is true throughout W. 
Now we have 


f V(E\0) 
Jw 


dx — 1, 


and hence, differentiating with respect to 0 and putting 0 = 0„ 


f [p'(E\0)\dx = 0. 
Jw 


Substituting from (26.17), we have 




and hence 


Thus, from (26.17) 


[p' ($ 1 0) k = o 

Po (E | e 0 ) 

IP' & | 0)] #o = 0. 


(26.18) . 


(26.19) 


But this implies that the derivative of p with respect to 0 is identically zero at 0„, which 
is contrary to hypothesis. The theorem follows. 

It may be noted that in deriving (26.17) from (26.16) we used the property that A 
may have either sign. If it can have only one sign, that is, if our class of admissible alter¬ 
natives is confined to the case when either 0 < 0 o or 0 > 0 O , a U.M.P. test may exist; and 
so we found in Examples 26.2 and 26.3. 


Best Critical Regions and Likelihood 

26.18. Since on the boundary of a best critical region we have p 0 — kp t = 0, that 
boundary is determined by the condition that on it the ratio of the likelihoods of two 
functions corresponding to H 0 and H 1 is constant. 

Consider now the case where H 1 comprises a set of alternatives varying according tcT 
the parameter 0, H 0 being one of them. In accordance with the principle of maximum 
likelihood we shQuld obtain, as the most likely value of 0, the solution of — 


(?P) = 0 , 


. (26.20) 
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where 6 is then expressed as a function of the variables. If this value is substituted in 
p, we obtain the distribution with greatest likelihood which may be written p (Q max.). 
The surfaces of constant likelihood are defined for this distribution by 

Po — Ap (Q max.) =0. .... (26.21) 

Now these surfaces are, in fact, the envelopes of the family, varying with 0, 

Po — kp e = 0,.(26.22) 

dp 

for to obtain the envelope we differentiate with respect to 0, giving ^ = 0 and eliminate 0,**j 

leading back to (26.21). Thus, if there exists a best critical region (and hence a U.M.P. ' 
test) for all permissible alternatives H e , such a region will be the envelope with respect to 
such alternatives and will therefore be identical with a region defined by (26.21); and 

hence a test based on the principle of likelihood leads to best critical regions, if they exist._ 

If, as is more usual, there is no common best critical region, the ratio of the likelihood 
of H 0 to that of any particular H e is k. The surface (26.21) remains the envelope of the 
family of surfaces (26.22) for which k = A. 


Example 26.4 

Consider once again the normal form, where both mean p and variance a 2 are specified 
and the admissible alternatives are that they can have any values, subject of course to the 
variance being positive. For any given p l and a 1 the best critical region will be given by— 

or 

This may be written in the form 


n ~s p) 2 + * 2 } > constant 

<n°o 


where 

Thus, if o x > <r 0 we have 
and if a x < a 0 we have 


. _ Po of - p x a 2 

r ~ 4-4 ' 


(* — p) 2 + s 2 > v 2 , say; 
(oc — p) 2 + « 2 < v*. 


For any specified pj and a x the best critical regions are bounded by hyperspheres with radius 
» and centre at x x — x t = . . . = x n — p. Owing to the fact that p varies with fi x and 
a x , there will not in general be a best common critical region and a U.M.P. test; and this 
remains true even if we limit our alternatives to a x < a„ and /i x < p. or by similar 
inequalities. 

We may regard x and s as independent variables and represent the data on a two- 
way plane (x, a). The best critical regions are then seen to be bounded by circles with 
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centre (p, 0) and radius v. Fig. 26.2 (adapted from Neyman and Pearson, 1933c) illustrates 
some of the contours for particular cases. A single curve, corresponding to a s ing le proba¬ 
bility level, is shown in each case. 

Cases (1) and (2): o x = a 0 and p = ± oo. The best critical region lies on the right 
of the line (1) if fx x > ji 0 and on the left of (2) if pi l < jbt Q . This is the case discussed in 
Example 26.2. m 

Case (3): a x < <r 0 , say a x — |<y 0 . Then p = // 0 + £ (a*i — a*®) and the region lies 
inside the semicircle marked (3). 

Case (4): a x < a 0 and = ^ 0 . The region is inside the semicircle (4). 

Case (5): a x > o 0 and fj, x = fz Q . The region is outside the semicircle (6). 

There is evidently no common best critical region for these cases. The regions of 



Fig. 26.2. —Contours of Constant Likelihood in a Two-dimensional Case. (See text.) 


acceptance, however, may have a common part, centred round the value (/£„, o 0 ), and we 
should expect them to do so. Let us find the envelope of the best critical regions, which 
is, of course, the same as that of the regions of acceptance. The likelihood ratio is 


* ■- ®' - m*--1) ~ M (4*)' •- ]• 

The partial differentials with respect to fi t and cr l equated to zero give 

n _ ns® _ n / x — ^ 

<Ti d{ <Ti \ ffi / ~ 


whence we find 


^ - /ix) = 0 , 

x and a x = 8 and the envelope is 


4 
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The dotted curve in Pig. 26.2 shows one such envelope. It touches the boundaries of all 
the critical regions which have the same likelihood-ratio k. The space inside may be 
regarded as a “ good ” region of acceptance and the space outside accordingly as a good 
critical region. 

There is no best region for all alternatives, but the regions determined by envelopes 
of likelihood-ratio regions effect a sort of compromise by picking out and amalgamating 
parts of critical regions which are best for individual alternatives. 


Example 26.5 


In the previous example we have supposed that the sample space W was the same for 
all admissible alternatives. This is quite legitimate, for we can always regard the domain 
of variation as infinite by supposing that p = 0 outside the range of the frequency-distri¬ 
bution of the variates. In the normal case, of course, p does not vanish anywhere, so that 
we are compelled to consider W as infinite. 

When, however, the sample-space for non-vanishing p is bounded, special circum¬ 
stances may arise, and it is occasionally necessary to consider separately the different 
discriminating regions. Por instance, if the sample-spaces corresponding to H 0 and H x 
are W 0 and W u it may happen that W 0 and W x have no common part when both p 0 and 
p x are greater than zero. If so, we can distinguish between H 0 and Hi with certainty. 
If there is a common region then Wi — Woi should be included in the best critical 
region, for to do so reduces the probability of errors of the first kind. But it does not follow 
that this should constitute the whole of the critical region, for we might then commit too 
many errors of the second kind, i.e. accept H 0 too often when H 1 is true. We may then 
wish to add to W x — W 0l a region w 00 , making w 0 altogether, such that w 00 lies inside W ox 
and p 0 (E e w 00 ) = p 0 (E e w 0 ) = 1 — a. This controls the first kind of error to level a 


and reduces the second kind of error. 
^/Consider the population 

1 


p(x) = 


6 ’ 


a — <x <o + J6 


Irt,*«’»*■» 


= 0, elsewhere. 


Suppose a sample of n to have been drawn from a population of this kind where b is known. 
We wish to test whether a has some value a, as against the alternative a,. 

The sample-spaces W„ and W x are hypercubes centred at a 0 and a x . If they have 
a common part PF 01 the probabilities p 9 and p x in that part are both proportional to the 
volume and p 9 /p x = 1 everywhere in the region. If, then, we take any region w t0 of con¬ 
tent 1 — a in Woi and add it to W x — JF«, we get a best critical region, and there are clearly 
infinitely many such. 

Por the admissible alternatives a x the hypercube W x will move along the long diagonal 
Xi = x, = . . . = x n as a x varies, and we cannot always find a common region of size 1 — a 

to form Woo- By taking such a region as a hypercube of side b (1 — a) n , however, fitted 
into one of the comers of W 0 lying on the long diagonal, we “ nearly ” obtain such an object 
since this region provides what is required so long as W„ and W x have a common part of 
content 1 — «. Which comer we choose depends on whether the hypothesis is a x > o 0 
or a ( > a,. 
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Relation between, U.M.P. Tests and Sufficient Estimators Tjj>' ^ 

26.19. It was thought at one time that the existence of a set of U.M.P. tests for 
a continuous range of admissible alternatives involved the existence of a sufficient estimator 
for the parameter concerned. This does not appear to be true in full generality, but is 
so in nearly all the cases occurring in statistical practice. We will prove a theorem on the 
subject:— 

If a system of U.M.P. tests exists and if any point in the sample-space lies on the 
b oundary of a best critical region, then a sufficient estimator exists for the parameter whose 
v ariation provides the admissible alternatives.* - 

It is enough to show that for an arbitrary point we have 


Pi (E) = h (t, 6)p 0 (E) . (26.23) 


for then t is sufficient for 0 by definition, 
region we have 

Pi(E) 
Po (E~) 


Now we know that on the boundary of a critical 

1 r 

= ^ - h, say, 


where h varies with the x’s and with 6. We show that h has the form h (t, 0) by defining 
a function t and showing that if t has the same value at any two points E x and E t , then 


for all 0. 


Pi (Ei) _ Pi (Ei) 
Po (E x ) p 0 (E t ) 


26.20. For this purpose we require a lemma to the following effect: if a set of U.M.P. 
tests exists, it will be said to be ordered if the condition a t > « a implies that the critical 
region w (a t ) is included in the region w (a 2 ) ; and if a set of U.M.P. tests exists but is not 
ordered we can always find another set which is. 

w (at) and w (a*) may include parts of W where p vanishes. Let the remaining parts 
be v (a t ) and v (a*) and, if v 0 is the common part of these regions, write 


v (ocO = v 0 + v' 1 
v (a,) = «„ + v" J 


(26.24) 


where v 0 , v' and v" have no common points. Now for any value of 6 and for any E in w (a x ) 
—and therefore in v '—there is an h x such that 


Pi (E) > hi po (E) in v' 

< hi po (E) outside, and therefore in v". 

Similarly, within w (a 5 ) and hence within v" we have an h 2 such that 

Pi (E) > htpo (E) in v" 

< hi po (E) in v'. 


It follows that, from the inequalities deriving from v", h x > h t , and similarly, from v', 
h t > hi. Hence h x = h t = h, say, and 

Pi (E) =hpo (E) ..... (26.26) 


within v' and v" for any 0. 


* The remains true if there is a set of points of measure zero for which the condition as to 

boundaries is not fulfilled. It is also true for several parameters, as may be seen by an easy generali¬ 
sation of the argument. See Neyman and Pearson (1936a). 
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Now take 
suoh that 


«(*i) = t># + v'" 
Pa dx = 1 — #!• 




. (26.27) 


This is always possible, for the integral of p 0 over v t + v" is 1 — a t , which is greater than 
1 — « t . It follows from (26.27) and the first equation of (26.24) that 


I p a dx — I p„ dx. 
J V" J v’ 


(26.28) 


Now put 

w> (*i) = Wo + u («i) = W 0 + Vo + v"', 
where W 0 is the part of W for which p, = 0. Then from (26.27) 


i 


w M 


Pa dx — 1 — a!. 


Further, w' (a,) is a best critical region with respect to admissible alternatives, for (26.26) 
and (26.28) imply that 


and hence 


I pi dx = I Pidx, 

Jv" J»' 

I Pidx = I pidx. 

J w («i) J v («,) 


Finally, w’ (a t ) is wholly included in w (a,). 

We have therefore replaced the region w (a t ) by another region w 1 (a x ) with the same 
properties except that it is included ii^ w (a,). The lemma follows. 


26.21. To return now to the main proposition, let E be any point of IF. If it belongs 
to only one boundary of a best critical region with content 1 — a we put t(E) — 1 — a. 
If it belongs to more than one, we put t(E) equal to the mean between the upper and lower 
bounds of values of 1 — a for which the boundaries include E. In virtue of the lemma, 
this implies that whatever the value of 1 — a between these bounds, the corresponding 
boundary must contain E. 

Thus t is defined everywhere. Further, if it has the same value at two points E x and 
E t these points must lie on the same boundary. It follows that on this boundary 

Pi (-gi) _ Pi (EJ 

Po (Ei) pa (E t ) 

and hence the theorem is proved. 

The converse is not generally true, but one has to exercise some ingenuity and import 
some artificiality to construct examples where it fails. Cf. Exercises 26.3 and 26.4. 

Composite Hypotheses 

26.22. We shall consider a class S3 of admissible hypotheses depending on r + s 
parameters 0 X ... 6 r .. . 6 r+ , and shall regard the hypothesis H 0 under test as one of 
this class. A composite hypothesis of r degrees of freedom is one for which s of the para¬ 
meters, say 0 r+1 . . . 6 r+ „ are specified, the hypotheses determining the distribute*''i 
apart from the unspecified parameters. For example, the hypothesis that a pppi .on 
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is normal with specified mean, nothing being supposed about the variance, is a composite 
hypothesis of one degree of freedom. It will be assumed that any admissible simple alter¬ 
native is given by specifying the other r parameters 0 t ... 6 r and that there is a common 
sample-space W for all such alternatives. 

Regions Similar to the Sample Space 

26.23. In order to test the composite hypothesis H 0 we need in the first place to 
control errors of the first kind by determining a critical region tv, such that 

f po dx = 1 — a. . . . . . (26.29) 

J w 

This, however, differs from the simple case in that p 0 can vary according to the unknown 
parameters, and to be certain of controlling the error we must be able to find w such that 
(26.29) is true whatever 0 1 . . . 6 r . If this can be done we shall call the region w similar 
to the sample-space W and shall speak of 1 - a as its size. 

The problem of testing composite hypotheses then becomes one of (a) finding the 
similar regions, and (6) selecting from among those regions the one which minimises the 
second kind of error for a simple admissible alternative H t . If this is the same for all 
H t we shall have a common best critical region. 


26.24. We consider in the first place the composite hypothesis with one degree of 
freedom. The general problem of finding similar regions in such a case has not been solved, 
but a solution is possible in one important class of case, namely, that for which 

(а) p 0 is indefinitely differentiable with respect to 0 X for almost all values of Q u 

(б) the function p 0 obeys the relation 

<f>' - A + B<f>, .(26.30) 

where 

<f> — lo g Po, • • • -(26.31) 

and A and B depend on 0 X but not on the x’s. In particular the normal distribution 
is of this type. 

Under conditions (a) and (6) it follows that for w to be similar to W it is necessary and 
sufficient that 


i *$•*-*• t = i - 2 . 

Let w be a region for which (26.32) is true. Then for k — 1 and 2 we have 

p 0 </> dx = 0 

J w 

f Po (t 2 + <f>') dx = 0. 

J to 

In virtue of (26.30), this last may be written 

f Po (<f> 2 + -4 + B<f>) dx — 0, 

J w 


(26.32) 


whence 


f p 0 <f> 2 dx = — A J po dx — — A (1 — a). 
Ju> Jw 


(26.33) 
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Differentiating (26.33) with respect to and using previous results, we find 

f p a </>*dx = (2 AB - A') (1 - . . . . (26.34) 

J IP 


and generally 


f p, <j> k dx = (1 - a) y> k (00,.(26.36) 

J IP 

where y> k (0j) is a function of 6 X only, and is therefore independent of w. Now (26.32) is 
true for W = w, and we find 


so that 


I p„ <f> k dx = y> k (00, • 

JW 

-—~— f Po <f> k dx = f Po <f> k dx . 

i ““ a J W JW 


. (26.36) 

. (26.37) 


Now consider the random variable <f>. Since p 0 integrated through w is equal to 1 — a, 

we may regard ~— as a frequency function defined in w. It follows from (26.37) that 

the moments of <j> in this domain are the same as those of <f> in W. Consequently, if the 
moments determine the distribution uniquely, the distributions of <f> are identical. 

Hence we may use the hypersurfaces <f> — constant to set up similar regions. The 
space W may be imagined as composed of shells of infinite thinness bounded by these 
hypersurfaces. If we determine an “ area ” on one of these shells equal to 1 — a times 
its area in W, the totality of such areas will constitute a region w of size 1 — a ; and since 
this will be so irrespective of 6 1 the region w is similar to W. 


26.25. When similar regions are determined by the above method we have to find 
the best critical region from among them. Let H t be a simple admissible alternative. 
We require to find from the regions w a region w 0 such that 

I p t dx = maximum. ..... (26.38) 

Jw 9 

We now show that this is equivalent to maximising 


subject to 


I p t dw 

Ju>(+) 

f p« dw ( <f>) — (1 — «) f p t dW 

JwM) JWW 


. (26.39) 

. (26.40) 


Here w (</>) means the element of w for constant «fi —the “ shell ” of the previous section. 
The object of this is to reduce our present case to that of simple hypotheses. We take 
^ as a new variable and consider together the remaining variables (which amounts to deter¬ 
mining similarity of w and W in each separate shell between <f> and <f> + d<f>, as in the previous 
section), and are thus left with regions dependent on <f>. Equation (26.39) then requires 
that the probability of the second kind of error in each shell must be a minimum, subject 
to the control of the first kind asserted by (26.40). 
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Suppose that (26.39) were not maximised. There would then exist a set of values of 
<j> for each of which we could determine a region v ( <f> ) such that 


and 


f p t dv ( <f>) = (1 — «) f p t dW ( <f> ) 
Jv{*) JW{*) 

| Ptdv (<(>)> f p t dw„ (<f>). 

Jv(4>) J u>. (6) 


(26.41) 


. (26.42) 


Let E be this set of values of <f> and CE the remaining set. We prove our result by obtain¬ 
ing a contradiction, namely by defining a region v which is similar to W, and such that 


l Pf dx > | p t dx, . . . . . (26.43) 

J v J w % 

which contradicts (26.38). 

Take as v the shells of hypersurfaces (1) in CE which are identical with w 0 (<f>) and 
(2) in E which satisfy (26.42). Now 

I p t dx = I d<j> I p t dv (</>) 

J v JE+CE J v (4) 

I p t dx = I d(f> p t dw 0 (<£). 

Ju'i J E+CE JttfoM 

I p,dx — l p t dx — I d(f> \\ p t dv (</>) — \ p t dw a (</>)> 

J V J W % J E+CE IJ V (^) J W 0 M J 

= d<j>\\ p t dv ((/>) — I p t dw 0 (<£U > 0, 

Je IJ« w J w. w J 

which is the contradiction required. 


and 

Hence 


(26.44) 


26.26. Thus our problem is reduced to that of finding, in the shells <f> = constant, 
portions w 0 (<f>) which maximise the integral of p t . We have, so to speak, brought the 
problem down one dimension by locating it in shells instead of dealing with it throughout 
the spaces w and W. It now becomes that of a simple hypothesis in (n — 1) dimensions, 
and the best critical region is the one for which 

Pt>^ e Pi, .(26.46) 

where A is a function of <£. The sum of these regions for the various values of <f> gives us 
the complete solution to the problem, and if this sum has boundaries which are independent 
of H t we have a common best critical region and a U.M.P. test. 

Example 26.6: “ Student*s ” Hypothesis 

A single sample is taken from a normal population 

** ~ ivW) “ p { - * } *’■ 

with unspecified a. We have then one degree of freedom, 0! = a, and the hypothesis H, 
is that p = /I., say. 
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We find 


, d , n , Z (x — u„) a 


3^ _ n. _ « Z (x —jj, o) 2 

3<t a* a 4 


2» _ 3^ 
<r* or 



^ {(* - p t )' + s'}. 


Condition (26.30) is satisfied, and <f> is constant over the hypersurfaces 
S (x — fi 0 ) z = n {(x — fx 0 ) 2 + s 2 } = constant. 

The hypersurfaces are hyperspheres in JF. To construct a similar region we have merely 
to pick out a region of size 1 — a on each shell and to amalgamate them. In our present 
case this is particularly easy because p 0 is constant over the shells and we need only pick 
out areas on each shell bearing to the area of the hypersphere the ratio 1 — a. 

These areas need not be of the same shape or similarly situated. By selecting them 
in different ways an infinite variety of regions may be constructed. We have to find the 
best for an alternative simple hypothesis a = a l9 fi = 

The condition (26.46) becomes 

Jf exp [ - - *>* + '■>] > &.-“!> [ - Si «* - '■•>* + *•>} 

As we are dealing with regions which are similar with regard to a, we may put a = 
and find 

x (ju, — fi») > £ (/if — /ig) — i uf log A = (/<! — //.) k x , say, 

n 


where k v = k t ($). Thus we find, for the boundary of w„ (<f>), 

if Hi > Hi, x > k x (<f>) 
if Hi < j“o» * f} < *i 

where k x has to he chosen so as to satisfy 

f pi dw (tf>) = (1 — a) f pi dW ( <f>). 

J«e<« J m4) 

Thus on any particular shell the “ cap ” cut off by the hyperplane x — constant must have 
area 1 — a and hence must subtend the same solid angle at the origin. Consequently the 
boundaries lie on a right hypercircular cone through the point whose co-ordinates are all 
equal to /*. and whose axis is perpendicular to x — 0, namely the line x x = x t = . . . = x n . 
For each a there will be a different cone. If Hx > /*• the cones will be in the posi¬ 
tive quadrant and in the contrary case in the negative quadrant. 

Furthermore, these regions are independent of Hi- Thus for the class of hypothesis 
p x > pi or p x < pi (but not both together) the common best critical regions and U.M.P. 
tests exist. 

Finally we have to evaluate a in terms of the sample values determining the<c critical 




COMPOSITE HYPOTHESES: SEVERAL DEGREES OF FREEDOM 287 


cones. We have already seen in Example 10.6 (vol. I, p. 239) that if z = -—the 
frequency inside the cone is 



dz 

+ «*)'* 


a. 


Thus “ Student’s ” test, which we have previously considered on more or less intuitive 
grounds, is now seen to be the best in the sense of the theory herein developed, for the 
admissible class p t > p„ or for that p t < p 0 . 


Example 26.7 

Consider a sample from the normal population with unspecified mean, the hypothesis 
being that a = a 0 . We now find 

, 3 , nix — u) 

4> = —logp 0 =* ' 

op ojj 

d<j> _ n 
dp~ 


so that (26.30) is satisfied. 

The hypersurfaces <f> = constant are the hyperplanes x = constant, and any regions 
of size 1 — a on these hyperplanes will provide similar regions w. The condition p t >j c P» 
will be found to reduce to 

8 2 (o§ — of) < — (x — pi) 2 (a\ — of) + 2<r® <r t 2 (log — + —log k\ = (a\ — of) k x , say. 

{ o t n } 


If a t > o 0 we have a® > k x (^) 

and if a t < o 0 we have a® < k x 


Since a* is independent of x, k x will be a function of a and n only. The best critical 
regions are those given by a® > a 2 and a® < a 2 as the case may be, and the appropriate 
values of a 0 corresponding to a may be found from the known distribution of a*. The 
critical regions are hypercylinders, and again there are two sets of best common critical 
regions, according as a t > a„ or a t < o 0 . 


Composite Hypotheses : Several Degrees of Freedom 

26.27. As a preliminary to extending the theory for one degree of freedom to the 
case of several degrees, we note that if a region w is similar to W with regard to 0 X . . . 0 r 
jointly, then it is so for each of them separately; and conversely. The direct result is 
obvious and the converse follows in this way: (we need prove it only for r = 2 because 
the rest follows step by step). If then 

I p dx == 1 — a 
J W 

is true for 0„ 6 t ... 0 r independently of 6 U and for 0 lf 0, ... d r independently of 0„ 
then it is true for any values of 0 X and 0 a and any other fixed values of 0» . . . 0 r ; and 
hence it is true independently of 6 X and 0, together. 
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26.28. An additional preliminary requirement is the concept of independence of 
a family of surfaces of a parameter. Suppose 

f j {x l ... x n ,0) =Cj j = l, 2 ... k <n . . . (26.46) 

represents a family of surfaces, where 0 and the C 's are variable parameters. Let 
S (0, Cx . . . C k ) be the intersection of these surfaces, or, if k — 1, the surfaces themselves. 
Consider the family obtained by fixing 0 and allowing the C’s to vary. Then if any surface 
of this family for 0 X can also be obtained from a second family for 0, we shall say that the 
family is independent of 0. We get the same aggregate of intersections however 0 is chosen. 
For example, if 

ft = (*i - 0)* + (z, - 0)* + (z s - 0)* = C 3 
and /, = Zi + x, + z s == (7„ 

the family S consists of circles in planes at right angles to the line z x = z, = z, and having 
their centres on that line. This is true however 0 is chosen, and S is therefore 
independent of 0. 

26.29. Under certain restrictive conditions similar to those of 26.24 it is now possible 
to find solutions to the problem of determining best critical regions. We assume 

(1) that exists almost everywhere for all k and j — l ... r; 

dfJj 

(2) that if & -* ggj log p 0 and <f>} = |^. 

then <j)j = Aj Bj <f>j j ...... (26.47) 

(3) that the family of surfaces given by the intersections of <f>j = Cj is independent of 

Oj for j = 1 . . . r. 

Subject to these conditions (which are sufficient but not necessary) similar regions exist. 
Consider any two surfaces <f> x and tf> t . Since w is similar with respect to 0 X alone, we may 
find surfaces <f> x = constant and 

f p dw ( <f > x ) = f p dW (<f>i). . . . (26.48) 

Jtotfi) j w (*,) 

In accordance with assumption (3), the family of surfaces <f> t — C x is independent of 0 a . 
Thus if 0 t varies, W (<f> x ) and w (^ x ) will not vary, though perhaps they may correspond to 
other values of C x . Furthermore, (26.48) is true regardless of 0,. Hence within the shell 
tf> x = constant we can repeat the analysis used for one degree of freedom. We find that 
the necessary and sufficient condition for to to be similar to W with regard to both 0 t and 0 t is 

f pt dw <f> 3 ) — (1 — a) f Pt dW (<f>u • • (26.49) 

J to *.) J W Hi, 4,) 

where W is the intersection of <f> x = C u <f> t = C t for any values of C x and C t ; and similarly 
for w .. 

. As before, the most general region w is obtained by amalgamating the portions of size 
<1 — «) on the intersections of <f> x and <f> 3 . The generalisation to r degrees of freedom is 
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immediate. It also follows in the usual way that the best critical region is the one for 
which 

Pt dx > I p t dx , 

J w 0 J r 

v being any other region of size 1 — a ; and w 0 is defined by 

Pi>h(d 1 . • . . 0 r )p o .(26.50) 

The following examples will illustrate the theory. 


Example 26.8. Ratio of Two Variances 

Suppose we have two samples of n u n 2 members from independent normal populations 
whose means and variances are unknown. The joint distribution may be expressed as 

f x -f-n. ex P [- 2a* < (f * “ /ll)2 +S V ~ 2a\ ~ ,li) * + 4} ]' 

Consider the composite hypothesis a x — a 2 ----- cr, say. This has three degrees of freedom, 
for p u fi 2 and a are unspecified. As the alternative H t we will take 

Ox — fi u 0 2 “ Hi fi\ = b u 0 2 — <Ti, 0 A = 

and for H 0 itself 

Si = S* = 6, 0 a = o’, 0 4 = 1. 

We have first to consider whether the conditions of 26.29 are satisfied. 

(1) Evidently p 0 is differentiable for all parameters any number of times. 

(2) We find— 


d 1 

<l>i = ^ h*g Pt - (*i - p) I -fa - p - b)} 

fa ^ /a 1o 8 P° = “1 - P - b) 

ob o* 

(j) 3 — .A log p 0 — — ^ 1 + — {>ii (x t — p) 2 + n 2 (x 2 — [i — b ) 2 + ^i 4 + ^1} 

0(7 o o 


and (26.47) is seen to be satisfied. 

(3) The hypersurfaces <f> x — C x are evidently equivalent to 

n x x x + n 2 x 2 == C[ 9 

where C x is an arbitrary parameter. The hypersurfaces <j> 2 — C 2 give similarly 

x 2 = C' 2 . 

Both these are independent of 0 2 and their intersections, namely x x = constant, x 2 = con¬ 
stant, are independent of 0 3 . Thus the third condition is fulfilled and we may apply the 
foregoing theory. 

The equations <f> x ^ constant, <f> 2 = constant, <f> 2 = constant are equivalent to 

x x = constant 
x 2 = constant 

n x s\ + n 2 4 = constant = (n x + n 2 ) 4, say. \ 


A.s.—VOL. II. 


u 
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The element w„ is part of W (<f> lt <f> t , <f>») within which 

Pt >Po/h(»i, *t, « a ) 

and this condition, by reference to the frequency function, becomes 

exP [“ 2 ^i fai (*i ~ /*)* + ~{n t {x t -n- 6)* + n,a*}J 

<0«. exp [~ 2^** % ” i “ l) * + niw A -2 + /*.)* + ».*i}J. 

Since the region w is independent of fi, b and a, we may put them respectively equal to 

fi u b t and o t and hence find for the condition 

n % (1 — 0®) {(*» —fJh — 6 t ) + af } < 2<rf 0| (log h —n, log 0 4 ). 

Since this inequality holds good on x 2 = constant it contains only one variable s\ and we 
accordingly find two cases :— 

If 0 4 = ~ > 1 the best region is defined by s\ > h[ (x l9 x 2 , s %); 


If 0 4 = ?± < l the best region is defined by sf < h' z (5 lf x 2 , 

0i 

We have now to determine h 2 so as to satisfy 


1 p 0 dx = (1 — a) I p 0 dx. 

J it. {*„ 4>>) J (*„ K *.) 

Now W <f> 2 , <f> 2 ) is the locus for which x l9 x 2 and s 2 a are constant, and thus the integral 
on the right is the product of 1 — a and the frequency function p Q (x l9 x 2 , 8%). Similarly 
that on the left is the integral of this function over the region for which s\ < h\ Thus 

f p 0 dx ~ f p 0 (x l9 x 2 , s*, ds\ in the first case, 

J J hi 

with a similar expression but different limits in the second. Now we have for the joint 
frequency function of x l9 x 2i and s\ 

f a s “*' 3 ex P [- ~ /*»)* + w » (*• - <“») 2 + (»i + n *) «§} ]• 

Transforming from af to si as variable, we find for the condition, after a little reduction— 

P { (»i + n t )4 — n t slflT a,” 1-3 ds\= (1 — a) f { {n x + n t ) «* - », sfJ'V 
J ft' Jo 

where h" — -A——- On substituting n, s\ = («j + »,) a® u we find— 

-fa 

It follows that depend only on a, n x and n 2 . Thus, whatever the values of x l9 x z 

and 8% 9 the best critical region is defined by 


.-3 


d8% y 


n % 

' w,—3 th~3 

(1 — u) * U 2 


du 


Wf — 3 w,-3 

.) 2 U 2 (fa 




4>K- ». 




if a % > ffi 




if <r* < <r 4 . 
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These are equivalent to 


u 


___ ^2 

n x 4 + n, s| 


Uq 


If we put 


< u 0 


if O 2 > (T t 
if <Ti O 2» 


* = * log w ‘ / (n *- 1 ) s ? 

the -B-distribution of w reduces to Fisher’s form. The result we have reached is therefore 
equivalent to showing that the 2 -test is the best for the ratio of two variances in normal 
samples. As usual, there is no U.M.P. test for the whole range of the ratio from 0 to oo, 
but two U.M.P. tests for the ranges 0 to 1 and 1 to oo respectively. 


Example 26.9. Difference of Two Means 

Consider again the previous example, where now the variances are unspecified but 
equal and the means and jti 2 — /.i x f b may have any values. The hypothesis H 0 is that 
6=0 and has two degrees of freedom corresponding to p and cr. 

Let the alternative H t specify the parameters 

6 X — tifj 0>2 ~ 0 3 :== b t . 

In addition to the quantities required in the previous Example we now use also x 0 and 
a;’, the mean and variance of the pooled samples. 

We find that the three conditions of 26.29 are satisfied, and 


rt 


<t>* - - + %i! + ni -~- {(*. - r*i) 2 + « 5 }. 


Equivalent to this family are the surfaces 

x 0 ^ C x 

4 = 0 % . 

The condition p t > h <^ 2 ) p 0 reduces to 

b t (*! — x t ) < h' (Z 0 > *o). 

and as usual we find two cases according as // 2 > p x or vice-versa. We consider only the 
first, the second being analogous. 

Writing v — x x — x a > we have to determine h' by 

rhi" rh iv 

p 0 (jc, 4, v) dv ■= (1 - a) p 0 (x 0< v) dv, 

J V" j /*"' 


where h' n and h iv are the lower and upper limits of the variation of v for fixed values of x 0 
and s^. 

The frequency function of x 0 , sjj, v an( i *i eas ^y found to be 


/ 0C (»! + »,) «§ — »1 s\ 


n x n t 
n x + ^2 

whence that of x 0 , sj and v is found to be 

». 


v 2 V 2 exp 


oc (4 


(»X + n t ) 2 J 


exp 


»! + 


Wl 2 ^ a — { (*• “ + «g}J, 

{(*.-A*i>* + «?}]• 


2or* 
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Sinoe x t and are constant over the domains under consideration we have to satisfy 


£(« - „,v;, -r*** ” 2 " - <(* - 




where 
If we put 

this reduces to 


v — 


»l + »«) Sf h „ 

V(ny n t ) 

(Wi + 71*) s 0 




VfaiO (1 + z 2 )*’ 
dz 


and 


1 p*' ds 

sQ,” 1 + 2 , 2 ) “ (1+ ** 

2 = X t -X„ / 

*! + »2 *i) V 1 


t)- 2 


= 1 - 


n x n 2 
n x + Wo* 


We have thus arrived at the t-test for the difference of two means in normal variation when 
variances are equal. Once again the test we introduced on more or less intuitive grounds 
has been shown to be justified in the light of the theory developed in this chapter. 


Linear Hypotheses in Normal Variation 

26.30. Several of the hypotheses dealt with in foregoing examples are particular 
cases of a general class known as linear hypotheses , which accounts for the fact that we 
keep arriving at the same sort of conclusions respecting them. 

Suppose we have n independent variates typified by x i distributed in the normal form 

"■ - JvW) “ p {- <*'' 

with common variance cr 2 but different means. Suppose the means are connected with 
r and s unknown parameters 0 t . . . 0 r . . . 0 r+8 by linear equations of the type 

• f*k = E c j k Oj. . ( 20 . 51 ) 

Suppose further that the hypothesis H 0 specifies r parameters 

. . . 0 r = B r y 

and hence is composite with s degrees of freedom. Then H 0 will be called a “ linear 
hypothesis ”. The reader can verify for himself that “ Student’s ” hypothesis, and the 
hypothesis as to the difference of two means when variances are equal, are of this type. 
The homogeneity test in variance-analysis and the test of regression coefficients are also 
reducible to the same form. If, of course, H 0 specifies r linear relations among the 0’s 
instead of the 0’s themselves, it can be reduced to a hypothesis which specifies the 0’s 
directly, except perhaps in degenerate cases which need not detain us. 

26.31. The theory developed in the earlier part of the chapter for composite 
hypotheses may be applied to linear hypotheses as we have defined them, and the argument 
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follows exactly that of Examples 26.8 and 26.9. It is readily verified that the three con¬ 
ditions of 26.29 are satisfied. We have— 


<f>j = constant j 


+ ^f “^) 2 

jl' 2n 3 . 

9a ^ - 2 ~ - <Pa 

or (T 


(26.52) 


. (26.53) 


We can therefore find similar regions w (<f> 1 . . . <f> r , <f> a ) and select from them the best 
critical regions in the usual manner. We will omit the rather cumbrous algebra and quote 
the following result (Kolodzieczyk, 1935). 

Transform to new variates E^ . . . K r +„ t/ r+J , +1 . . . y n by the equation 

r+M n 

X k “ l l k + ^7 C jk C jA: VP • • • • (26.54) 

j-i ;=r+«+i 

where the c’s are those given in (26.51) for j, fc < r + $ and the other c’s are orthogonal, i.e. 


Define 


and 


K 

£ °ki c Ji = °- * j > r + « 

i -1 

= 1. & =j, j> r + s 

nS't = j? yj 

j-r+x+i 

ii / r-f# v 2 

**" - Z (Z e * E ') ■ 


A further transformation of E r . n . . . E r .^ s is now made to variables \p rV1 
that (26.57) becomes 

r r+s 

5(1 — Rjk Ej Ek + Wk • 


j,k=l 


&--T+1 


r+s 


•= nSl 4 £ V>1 • 


(26.56) 

(26.56) 

(26.57) 
V>r+s BO 

(26.58) 

(26.59) 


fc-r+l 


The coefficients R can, of course, be obtained from the c’s by ordinary determinantal 
algebra. 

Writing now — 0 } — Of, i.e. the difference between 0 } on the alternative hypothesis 
and its value if H„ is true, we find that the best critical region is given by 


y Rjk e ] Ek 


V ^(nSl 4 nSD 


i.fc-i 


Rjk e j e k^ 


> v 9 , 


(26.60) 
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where v is distributed in the form 

dF oc (1 - dv - 1 <v < 1 . . . (26.61) 

and v 0 is given by 

1 - a = P dF .(20.62) 

Jr, 


26.32. There is one interesting conclusion to be drawn from (26.60). If a U.M.P. 
test exists, v should be independent of Oj and hence of £j. This appears to be possible 
only if the denominator in the second part of (26.60) is rational. But this denominator 
is seen from (26.59) to have the coefficients of a positive definite form and hence is only 
rational if r = 1. We conclude that if r > 2 no U.M.P. test is possible for linear hypotheses 
in normal variation. 

We have already seen that under general conditions no U.M.P. test exists for r = 1. 
A similar conclusion follows from (26.60) if r = 1, for it then becomes 


_ F\ i £ i E x 

VWn) I I 


> V 0 , 


. (26.63) 


which, as usual, leads to two cases according as e t 


> 

< 


0 . 


26.33. We will pause at this point to review our results. We began by defining two 
kinds of error and showing that a test could be defined as “ best ” for a single alternative 
hypothesis if it controlled the first kind and reduced the second to a minimum. When 
there is a class of admissible alternatives we may sometimes arrive at a U.M.P. test which 
will minimise errors of the second kind for any member of the class, and such a test may 
be regarded as the best attainable. Though the U.M.P. test does not exist in the great 
majority of cases, we may find tests which are U.M.P. for either 0 X > 6 0 or 0 X < 0 O . Such 
tests have been reached for “ Student’s ” hypothesis and several others in common use, 
and are found to give the same tests as those introduced on rather intuitive grounds in 
Chapter 21. 


26.34. The absence of a U.M.P. test implies that in the majority of cases we have 
to look for other criteria to provide “ best ” tests. In the remainder of this chapter and 
in the next we shall consider several lines of approach which have been developed :— 

(a) Relying on 26.18 we mav evolve tests based on the likelihood ratio . These will 
gi ve U.M.P. tests if such exist, and in the contrary case will do their best, so to speak, by 
fi nding the greatest common denominator among the best critical regions. 

(b) We may consider the properties of tests when the sample number n tends to infinity, 
and so obtain tests which are U.M.P. in the limit. Such tests, like maximum likelihood 
estimators, may be employed on the grounds that they are “ best ” for large n and 
presumably good for small n. 

(c) We may derive a new criterion from the concept of bias in statistical tests, which 
will be explained in the next chapter. 

(d) Recognizing that there is no test which is U.M.P. everywhere, we may seek for 
one which is U.M.P. in the neighbourhood of the true value. The idea behind this approach 
is that it will be more important to detect errors in the neighbourhood of the true value, 
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and that large errors may be left to look after themselves, either because they are infrequent 
or because almost any “ reasonable ” test will reveal them * 


(e) When a number of independent parameters are involved, we may abandon the 




iiSaiaKQg-fVjm? fl 




which they are functionally related, e. 


e case of a single parameter w. and we may be able to show that a particular y* is the 


sts. 





Tests Based on Likelihood 


26.35. Suppose that for a given member of a composite hypothesis H 0 the joint 
sampling distribution of the variables x x . . . x n has a frequency function p 0 (which is, 
of course, the likelihood). Considering the x’s as fixed, we may examine the variation of 
p 0 according to variation in the unspecified parameters Q x . . . 0 r which form a set, say 
a). Let p 0 (<w max.) be the maximum value of p 0 for such variation. Similarly, if Si is 
the class of admissible alternatives H x , let p x (Si max.) be the maximum of the likelihood 
for variations of all the parameters 0 X . . . 0 r+8 . Write 


^ __ Po (<*) max.) 
p x (Si max.)’ 


. (26.64) 


Then a possible criterion: 


A < constant = C , say, ..... (26.66) 


where C is determined by relation to a probability level a from the sampling distribution 
of A, which of course is independent of the unknown parameters. In defining A we have 
assumed that the maxima on the right of (26.64) exist, but we can give the equation greater 
generality by taking p 0 (to max.) as the upper bound of values of p 0 in the set a> where no 
maximum exists; and so for Si. 

In this form the criterion states that we! are to accept H 0 if the maximum likelihood 
in the set of permissible H 0 ’& is greater than a specified proportion of that in the set of 
alternatives H x . In doing so we control the first kind of error in the ordinary way. So 
far as concerns the second kind of error we saw in 26.18 that for H 0 simple the criterion 
provided a sort of highest common factor among available tests ; and presumably qualities 
of this kind will be equally useful when H 0 is composite. 


The Problem of k Samples 

26.36. We will illustrate the theory of the likelihood tests by discussing a problem 
of considerable practical importance. Suppose we have a sample from each of k normal 
populations, x^ being the jth member of the ith sample. Let 

Ui be the number in the ith sample ; 

N = Z (n { ) be the total number of observations ; 

x { be the mean of the ith sample; 

s\ be the variance of the ith sample. * 

* An alternative line would be to concentrate on errors of the second kind for larger deviations, 
on the ground that large errors are more important than small ones. I understand from Dr. B. L. Welch 
that he considered this approach shortly before the war ; the results did not differ very materially from 
those given by requiring optimum properties near the true value in the case he examined, and the 
results were not published. 
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■' donsider three different hypotheses H„ :— 

(1) JEf, that all populations are the same and hence have the same unspeoified mean and 
unspecified variance. 

. (2) Bi, that they have the same variance but different unspecified means Pi • • . p k . 
(&) H%, when it is known that they have the same variance, that they have the same means. 


We have for the joint likelihood— 

1 1 

P =- n k -exp 

(2 7i)* TI a”‘ 

1 

Consider first of all H. We find, for p (Q max.), 

. (26.66) 

Si — Of .. 

and for p (to max.), putting all the /e’s and o’s equal and equating the first partials of logp t 
to zero. 


;-i 


( Xj ~ Pi) 2 + 8j 
2fff 


1 * 

• Hi = *0 = -y X W < 

1 — 1 

1 k 

of = ^ JT' n s { (£,• - £„)* + } . 


Inserting these values in p we find, after a little reduction, 

hi = ,Z (^) 2 * * 

Similarly it may be shown that 

1 k 

where 3 “ “ Jf £ n * 

i*»l 

and also that 

M 3 )- • • • 

It will be noticed that X H = X Hi A Hi . 


. (26.68) 
. (26.69) 

. (26.70) 

. (26.71) 
. (26.72) 

. (26.73) 


26.37. The function A Wi may be related to the correlation ratio »?*. We have 

4 = 4 + £ n i (*i ~ *«)*.(26.74) 

and hence 

= (f 4 - *»)*|f 

N 

= (1 — »?*)*.(26.76) 

The distribution of hi % is thus obtainable directly from the known form for rj 2 in samples 
from an* uncorrelated population. 
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We also find 

(*«,)* = 4-1* 7 («?)"'}*.(26.70) 

S a 

(*h)* = \{ntf) n <}h .(20.77) 

s o 

2 

The distribution of (A //a ) v is that of 1 — rj 2 , where the distribution of t] 2 is 

Ar-3 *V —Ac—2 

dF oc (|J*) 2 (1 - ri 2 )~~*-- dr) 2 .(26.78) 

It can accordingly be tested in this distribution or the related 2 -form. This is, in fact, 
the criterion used in the analysis of variance for homogeneity tests, and it is interesting to 
remark that the 2 -test here arises in considering the hypothesis that the various distributions 
parent to the sample values, being already known to have the same variance, have the 
same mean. The other form of hypothesis, H , is that the samples come from the same 
population, and the equality of variance is not part of the data but part of the hypothesis. 

We are not then surprised, or should not be so, to find that the X H criterion leads to a 

different test. 

26.38. The moments of the distribution of X u may be obtained as follows. The 
joint distribution of x t and is 

dF oc 11 (. 9 ,)"' " 3 exp 2 jgL at + (x t - *)•} ] 77 dx, TI ds\. . (26.79) 

The distribution of means is independent of that of variances and can be ignored. 
Further, if 

Z 2 = \ Z n i (*f - X.) 2 
or 2 


then x 2 l g a l s ° independent of the variances, and we have 


dF oc 77 (Sf)"' ~ 3 exp ^ - 2’ \ x k i ex P ( — h 2 ) 11 <&< d X- 

. . (26.80) 

Put now 

„ _ 1 

Vi N «jf’ ' 

. (26.81) 

and note that 

a 2 x 2 = Nsl - I n { s? 

— iVsjj (1 — 27 y>i). 

. (26.82) 

Transforming to variables ip and s 0 , we find 

dF oc II y)fT (1 - 2%) 2 77d % s; v " , exp ^ - 2a \j ds i 


whence, for the distribution of the y>’s, 

dF oc 77 (1 — Zy>i) 2 11 d Wi- 

. (26.83) 

r 

II 

£3 

. (20.84) 

and hence we may find the moments of X H by integrating its powers over 

the distribution 
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(26.83). Integrals of this kind, known as Dirichlet's, are expressible in terms of gamma 
functions and we find, for the pth moment of about zero, 

+ ^r(”-" L ) ' ' ' (26M) 

When all the 7i’s are equal this reduces to 


( V ) 


(4ff) — & 


r(^i) 

H^- 1V ■ 


26.39. For the criterion ^Ih we start from the distribution 


and on putting 


we find, in much the same way as before 


dF oc II afr 3 exp j — — 2 s?) V 77 da} 


- Xc <) • ■ • 

same way as before, 

fc—1 / r-| \ Wj- 

<^(Ci . . . oc 77 - 2,^] — 


Further, 


whence we find 




pN 

nt r 




r / b + i )«■<-- i ] 

_A-i___*_II 

I 1 »^(^) ‘ ’ 




(26.86) 


(26.86) 


(26.87) 


(26.88) 


(26.89) 


(26.90) 


(26.91) 


26.40. For large n t we find, in virtue of the Stirling approximation to the gamma 
function, > 

(1) for A ff **^(p+T)*=* 

(2) for p p ->- 

(p + 1) a 

(3) for Ah, -►- 

(p + 1) 2 
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(-log*)*- 2 

rii'-i) 


(2) and (3) 


If-3 

( — log x) 2 

7 ( 1 -)' 


Hence, by the transformation x = e we see that approximately X H is distributed as 
X 2 with v = 2k — 2, and X Hx and X Ht as with v = k— 1. 


26.41. For small samples Neyman and Pearson have suggested approximating to 

22 

the distributions of hi N and hi x N by identifying their lower moments with those of the 
form 

dF oc 1 (1 — x ) m '~ x . 

This possibility has been examined in detail by Nayer (1936) for the hypothesis H 1 when 
all the n 9 s are equal. The distribution of X H has also been studied by Wilks and Thompson 
(1937a). 


26.42. Modified forms of the above tests have been considered by various authors. 
We may write 

log hu “i Z n i !°g^.(26.92) 

s a 


where, of course, 




2 

In short, ^ is a weighted mean of the sf and (k Hl ) N is a weighted geometric mean. Bartlett 
(1937c) has proposed using the degrees of freedom v i (— n i — 1) instead of n i in these 
equations, that is to say, defines a criterion 


2 

/* V 



. (26.93) 


This test is, in the sense defined in the next chapter, unbiassed, whereas that based on 
X Hi is not. Bartlett also suggested as an approximation that — - could be regarded 
as distributed as x 2 with k - 1 degrees of freedom, c being given by 


1 + 




. (26.94) 


This has recently been reconsidered by Hartley (1940), who showed that it is not very exact 
for large k and gave a better approximation which can be reduced to tabular form. Cf. 
Exercise 27.2. 
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Likelihood, Criteria for the Linear Hypothesis 

26.43. We now proceed to consider the application of the likelihood criterion to the 
class of linear hypothesis as defined in 26.30. We have, for the likelihood function, 


* “ (?vW))" eIp { “ -ft' 1 }• • • ■ 

Writing S 2 = Z (x t — p } ) 2 we have, for the stationary values of p t with respect to a and 
the parameters 0 (related to the p’s by (26.51)), 

tt" log Po = — - + -j= 0 (26.96) 

da a <x 8 

1- logpo == - ~ Y (x k - p k ) c jk = 0. . . . (26.97) 

m i a 

This last equation is clearly the one we should get if we were seeking to minimise S 2 itself 
for variations in the 0’s. Let nSl be this minimum value. We shall then have, from 
(26.96), 

o 2 = S'q .(26.98) 

The maximum of p in the class £2 of admissible hypotheses is then 

. (26SW> 

Similarly the maximum of p in the class <o for which 0 t . . . 0 r are fixed and the other 

s 0’s vary, is found to be 

*(” m + ' ' ’ (26100) 


where n (S% + S'l) is the minimum of S 2 under the conditions that 0 X 
Thus we find for the likelihood ratio A 


xn = 

i + 


i 


(- 1 )' 

or, if more convenient, we may use the function 


. 0, are fixed. 


(26.101) 


Z = 


St 

S a 


to provide a criterion. 

Now we make the transformation (26.54) and show that the values S a and S b as we 
have defined them here have, in fact, the values given by (26.56) and (26.69). We have, 
from (26.54), 


n r r+8 n > 2 

H C i ky i\ 

l 1 i-r+8+1 J 

n n 

= + Y(Zc Jk y ,) 2 

fc -1 fc -1 

-i> «»«,)*+ z & 
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Since n 8% is the minimum of S 2 for all variations of the 0’s and E and y are independent 
of the 0’s, we must have 

nS'i^Eyj. 

Also, since nS b is the minimum of S 2 when the values 0, . . . 0 r are fixed, it is seen to have 
the value given in (26.59). 

We have also 


S 2 + nS* . (26.102) 

where nS 2 = £ f £c jk E j 

k~\ \ jT\ 

and the frequency function of E’h and y' s is given by 

/(£?!... E r+ „, y r+K+l . . . y„) oc exp | - (fl» + Ng) j. . (26.103) 

Now nSl is the sum of squares of n - r — s normal variates, and hence 

f(8„) oc 1 exp ( - . . . (26.104) 

Hence, since the E *s are independent of the y s, and since 8* depends only on the y' s, 

f(S a , E l . . . E r+ „) oc 8”- 1 exp | - £- (S 2 + S 2 ) |. . (26.105) 

We have seen, in effect, that n Si is the minimum value of S^. It depends on E x ... E r 
and hence is independent of S* and is distributed as 

f(Sb) « Si lex p(-^ 2 )- 



Thus we have 

f(S„, s b ) cc S” -- 1 flj-' exp j- 2 ” 2 (SI + Si) j. 
Putting now Z = S h /S a , we find 

f(Z) oc Z r+S ~ l (1 + Z 2 )~ V • 
which may be reduced to Fisher’s form by putting 

s - J log ---si- = log Z + £ log 




. (26.106) 
. (26.107) 

. (26.108) 


-We have thus reduced the test of the linear hypothesis to the z-test and it is seen that 
several of the tests introduced in Chapter 21 can be justified on the likelihood criterion. 
These include the “ Student ” test for one mean, the extended form for the difference of 
two means, and the test for the ratio of variances. Certain other tests in which the 
z-distribution (which, of course, reduces to the ^-distribution for v 1 = 1) appears—such as 
that of the correlation ratio, the multiple correlation coefficient and regression coefficients 
—also depend on the linear hypotheses, and in the light of the theory here presented are 
seen to be diff erent aspects of the same thing, at least so far as the testing of hypotheses 
is concerned. 


26.44. We will indicate briefly, without going into the complicated mathematics 
involved, some interesting results obtained by P. C. Tang (1938) and P. L. Hsu (19416) con¬ 
cerning the power of the z-test as applied to linear hypotheses. 
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The funotions Si and Si, as we have seen, are distributed independently in the 
X’-form, and their ratio accordingly in Fisher’s form. From this viewpoint the test of 
the linear hypothesis is a generalisation of the test of homogeneity in the analysis of 
variance. Tang considers the distribution of 

■ • • ■< 2610 »> 

and the variation for errors of the second kind, namely, when the values 6 t ... 8 r are 
different from the specified values. He shows that the power of the test depends, not on 
individual alternative values, but on a single function of the 0’s. He also obtains the 
power function and tabulates it. 

Hsu then considers other possible tests which are based on this single function and 
shows that in this class of test the 2 -test or the equivalent 2? 2 -test is the uniformly most 
powerful. 


26.45. For large samples, when maximum likelihood estimators of the parameters 
exist, the distribution of — 2 log A is that of % 2 with s degrees of freedom. For the 
distribution may then be written (see 17.46)— 

dF = A exp j- 127 g jk 0 } - 6 } ) 0 k - 0 k ) j d0 x . . . dd r+a 

so that p (Q max.) = A. . . . . (26.110) 

If 6 t ... 6 r are fixed the likelihood becomes 

p = A exp j- | Zg) k z } z k - koj. 

r 

where xl = £ 9* & ~ fy) fa ~ 0 k ) .(26.111) 

j, *=i 

and z'j is given by 6j — 0, — L t where L } is a linear function of the r specified parameters. 
Thus— 

p (to max.) = A 0 .(26.112) 

where A„ is the value of A when 0 } takes its true value 0 jO . Thus, when H 0 is true, 

A=e-*« .(26.113) 

But the characteristic function of (= — 2 log A) is 

j p 0 e lix * dfi j . . . d$ r+a 

= A J exp j - | Z g'j k z) z k + xl (** - i) j <$i • . . d$ r+B 

oc -i—_.(26.114) 

(1 - 2it)\ 

This is the characteristic function of a quantity distributed as % 2 with 8 degrees of free¬ 
dom, and hence the result follows. 

26.46. In concluding this chapter we may mention briefly a question which fre¬ 
quently presents itself when statistical hypotheses are being tested in practice. Our tests 
are based on the observed values obtained in the sampling process, and in order to apply 
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them we require no prior knowledge of the parameters to which they relate. They can 
be used in a state of complete ignorance about the parameters. But suppose some informa¬ 
tion is already available ; or suppose that we attaoh varying degrees of importance to the 
avoidance of particular types of error. How far are the tests developed in this chapter to 
be modified ? 


26.47. Consider, for example, the situation which has already been mentioned in 
connection with the theory of estimation, of the chemist who is assaying the strength of 
a particular drug. If the drug has harmful effects in large quantities it may be much more 
important for him to detect cases in which the true strength exceeds his hypothetical value 
than when the true strength is deficient. Again, the manufacturer of a “ guaranteed 99 
product is usually much more concerned with ensuring that it does not fall below the 
guaranteed standard than that it exceeds such standard. In such circumstances we may 
be particularly interested in “ one-sided ” tests of the type £ < £ 0 , and as we have seen, 
there more often occur U.M.P. tests for this class of alternative than in the case when £ 
can have any value. We might, therefore, be quite ready to accept such a test, knowing 
quite well that it may be insensitive in part of the range of the unknown parameter, merely 
because errors in that range are relatively unimportant. 

Similarly we might be willing to accept a test which hail a poor discriminatory power 
in part of the range but compensating advantages elsewhere, simply because we know 
beforehand that values of the parameter rarely or never fall into that particular part of 
the range. This is equivalent to prior knowledge of the distribution of the values 
determining the alternative hypotheses. 

26.48. It is difficult to reduce rather vague prior knowledge of a parameter to numeri¬ 
cal form, and hence to extend our theory with great precision to cover these cases ; but in 
practice it is desirable to consider, before adopting a test, whether any prior knowledge is 
available, or whether our interests centre on particular parts of the range. If they do, we 
may consider the behaviour of power functions of the possible tests at our disposal and 
examine which is the more powerful test in the particular part of the range which interests 
us most. The mere fact that the theory developed in this and the succeeding chapter 
makes no assumptions about the prior probabilities of admissible alternatives does not 
mean that we should be acting sensibly in ignoring any prior information which may be 
at hand when applying the theory, or that we need feel compelled to apply tests with 
optimum properties in regions where we know the unknown parameter-values will not fall. 


NOTES AND REFERENCES 

The theory of this chapter is very largely due to Neyman and E. S. Pearson, whose 
treatment has been closely followed. In their first contribution to the subject (1928) the 
likelihood criterion was developed, the theory of first and second kind of errors and power 
of tests being given in 1933. For the theory of unbiassed tests, see the papers of 1936 and 
1938. In the last few years the literature has grown considerably. 

Feller (1938) has shown that similar regions only exist in rather exceptional circum¬ 
stances and that the theory of composite hypotheses is incomplete. Tables of certain 
power functions and distributions associated with likelihood tests are given by Mahalanobis 
(1933), Neyman and Tokarska (19366), Wilks and Thompson (1937a), P. C. Tang (1938), 
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David (1939), Nayer (1936), and in Tables for Statisticians, Part II (Tables 36-37). See 
also Mahalanobis (1933). 

For tests based on the likelihood ratio, seeNeyman and Pearson (1928, 1931a, 1931b), 
Pearson and Wilks (1933b), Wilks (1936a), Nayer (1936), Welch (1936a), R. W. Jackson 
(1936), Sukhatme (19366), Bartlett (1937c), Wilks and Thompson (1937a), Wilks (1938a), 
Bishop (1939), G. W. Brown (1939), Mood (1939), Hartley (1940), Wald and Brookner 
(19416). 

For the general theory, see also Welch (1936), Kolodzieczyk (1936), Neyman (19366, 
19376, 19386), Daly (1940), Pitman (19396), Wald (1939a, 1941a), Wolfowitz (1942), E. S. 
Pearson (1941, 1942a), Dantzig (1940), P. L. Hsu (19416), Simaika (1941), MacStewart 
(1941), Scheffg (1942a, 1943). 


EXERCISES 

26.1. Examine the following argument: To accept H when it is false is equivalent 
to rejecting not -H when not -H is true. Hence, if K = not -H, to commit an error of the 
second kind for H is to commit an error of the first kind for K ; and thus there is 
no distinction between the first and second kinds of error. 


26.2. For the distribution 

dF = f) e~0< x -r> dx, x > y 

= 0 x < y 

show that for a hypothesis H 0 that /? = j8„ y = y„ and an alternative H l that fl = /J„ 
y = Yi, the best critical region is the region W 0 where p 0 = 0, together with the region 
W + defined by 

f < { yA ~ y *' ~ i ,og k + log ?.}• 

provided that the admissible hypothesis is restricted by the conditions y t < y 0 , f} x > ft 0 . 
Hence show that a U.M.P. test exists in such circumstances. 

(Neyman and Pearson, 1936a. This shows that a U.M.P. test can exist for more than one unknown 
parameter.) 


26.3. If the distribution function of x x . . . x n is given by 


dF = - 


i r j / 7i^ \ 2 7i ■v 

=—-« ex p - ^2 {X x > - n A - h y*) [ d Xi ... dx n 

o n ( 2 n)t 1 ° * 1=2 J 


y, a > 0, — oo < Xi . . . x n < oo 

show that the frequency function may be put in the form 

/ « ,*p ( - n - <|= ?>') „p ( - ( f *;) ; 

and hence that x is a shared ” estimator sufficient for y and a. Show further that the 
best critical regions for y„ a 0 difFer according as a* > <rf„ a* < or a — a 0 , and that 
their boundaries depend on y. Hence no U.M.P. test exists for admissible alternatives 
o> 0 . 


(Neyman and Pearson, 1936a.) 
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26.4. In the previous exercise put a —y and consider the class of hypothesis y > 0. 
Show that there are different best critical regions according as y> y*, y •< y t and that 
their boundaries depend on y. Hence there is no U.M.P. test, but x is sufficient for y. 

(Neyman and Pearson, 1936a.) 

26.5. In samples from a normal population, show that the probability of accepting 
the hypothesis that the mean fi < when, in fact, it is false and /i — fi t > /i a —that is, 
the probability of an error of the second kind—is 

/n\» 1 r* 

where 

a 

•Xs _// 

and t is the value of-corresponding to the significance level 1 — a for the control 

s 

of errors of the first kind. 

(Neyman and Tokarska, 19366.) 


v n ~ l exp ( — — \ — i— f du dv 


1 — P 0 


26.6. 


In six samples of six members each the following values were obtained—- 


Sample. 

Mean. 


1 

8433 

' 24,722 

2 

8200 

94,133 

3 

7933 

149,733 

4 

8120 

45,037 

5 

7971 i 

! 88,480 

6 

8263 

1 

49,921 

i 


with ag = 104,688, = 75,338. 

I - 

Show that k Hx N = 0*8508 and \h n — 0-6219. The 5-per-cent, levels are respectively 
0-67 and 0-54, so that there is no evidence of heterogeneity. 

(Pearson, appendix to papers by Wilsdon, 1934). 

26.7. Verify that the likelihood ratio leads to “ Student’s ” test for an unknown 
mean in normal samples, to the use of Fisher’s z in testing the equality of two variances, 
and to the t-test for the difference of two means in normal populations with the same 
variance. 


26.8. If samples n t ... n k are drawn from the populations 
dF = - exp ^ — Z - — ^ dx, i = 1 ... A 


use the likelihood ratio to test the hypothesis H 0 that the populations are identical, 
showing that 




a, 


n (x t - x n y“ 

I-I_ 

(*. - x.i) s 


Tll'Ii, say, 


a.s.— von. h. 


X 
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where is the mean of the ith sample, x (1 is the smallest member of that sample, z 0 is the 
mean of all samples together and x A is the smallest value in all samples together. 

Show that the distribution of x n and is 


and hence the moments of L, are 

N*r(N 


"4 h 


/<p - 


rw + p-i,,., | n ^ r(n< _ 1) 


If H 1 is the hypothesis that the populations have the same a but any possible different 
/Fs, show that 

jn _j _ nip 

--p-. 

where l is the weighted mean of the Z*s, and that 




N»rjN-k) 

* nx *'» { «,”/’<«, -1) ; 

. If H 2 is the hypothesis that the populations, being known to have identical or’s, have 
the same j9, show that the distribution of 

L t = X H h = - 

lo 

“ " ~ rW^W-i) **- (1 - Lif -' dL '- 

(Sukhatme, 19366). 


26.9. In the notation of 26.36 show that, if H is true, the criteria X Hx and X H% are 
distributed independently. 


(Neyman and Pearson, 19316). 
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Bias in Statistical Tests 

27.1. In considering the problem of estimation by confidence intervals in Chapter 19 
we had occasion to remark on the rather arbitrary nature of determining the interval so 
that both inequalities 0 X <0 and 0 < 0 2 had an equal chance |a of fulfilment. A point 
of a similar nature arises in the testing of hypotheses, particularly when an asymmetrical 
sampling distribution for the criterion is concerned. Consider, for instance, the testing 
of the hypothesis that in a normal sample of n members the standard deviation a has an 
assigned value <r 0 irrespective of the mean /i. As we have seen in Example 26.3, there is 
no U.M.P. test for all a > 0, though there is one for a > cr 0 and another for a < <j 0 . In 
choosing a test to cover the whole range a > 0 we have, therefore, a certain freedom of 
choice, since there exists no “ best ” test as we have previously defined the term. A 
common test in practical use is to take the sample variance s 2 and accept the hypothesis 
or = <r 0 if ft nd only if 

s( <s 2 <4 .(27.1) 

where s\ and s? 2 are determined from the distribution of s 2 , namely 


dF oc s’*-''exp ( - 

such that 


(27.2) 



dF * \ (1 - a). 


(27.3) 


In short, sf and s\ are chosen so as to cut off equal “ tail ” areas of the distribution. This 
procedure will, of course, control errors of the first kind ; but so equally well would the 
selection of s\ and s\ so that 

r dF = i - a x .(27.4) 

Jo 

and f dF = \ - a„.(27.5) 

Jsl 

provided that a x + a* = a. Thus we have an infinite number of regions which will control 
errors of the first kind. It is natural to seek for some criterion which will distinguish one 
as better than the others, recognizing that no U.M.P. test exists. 


27.2. Such a criterion arises naturally from the following consideration. In the 
example given, with a x = a a = |oc, let us calculate the power of the test for different values 
of a. This can readily be done from the distributions of type (27.2) by means of the incom¬ 
plete r-function or the equivalent % 2 integral. For any given a we have to find 

/8 («!> 4 I a) = p+ pdf, 

Jo Js| 
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. (27.6) 
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where 



. (27.7) 


Fig. 27.1, adapted from Neyman and Pearson (1936), shows the relation between 
the power function p and o* for aj = a, = 0-49, n — 3, the rejection level being 0*02. 



0* in Sampled Population (inunits of 0%). 

Fig. 27.1.—Power Curve in. Samples of 3 for o* from a Normal Population (see text). 


We see that for a > 1 — cr 0 the power increases, and so also for a < J = \a 0 . But 
between £<r 0 and <x 0 the power is less than 0-02, i.e. less than 1 — a. Hence for such values 
the chance of an error of the second kind, namely, the acceptance of a false hypothesis, 
would be greater than the chance of an error of the first kind, namely, the rejection of 
a true hypothesis. 


27.3. Whether this is felt to be anomalous depends on the relative importance of 
the two kinds of error in particular cases; but, other things being equal, it may be felt 
more important to avoid the second kind than the first, and not to have a greater probability 
of accepting the hypothesis when it is false than of rejecting it when it is true. This, at any 
rate, is the basis of the criterion which we proceed to discuss, namely, that the critical region 
w should be chosen so that P (E e w) is a minimum when the hypothesis tested is true. 

Consider then the case when H 0 ascribes to a parameter 0 the value 0 O , and the admis¬ 
sible alternatives ascribe other values to 0 but do not differ from H 0 in other respects. We 
shall say that w is an unbiassed critical region if, and only if, 

f p 9 dx = P (E e w | 0 O ) = 1 — a, . . . . (27.8) 

Jw 

and for any other 0, say 0', 

f p (0') dx a* P (E e w | O') > 1 — a. 

J 10 


. (27.9) 
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Equation (27.8) expresses the usual control of errors of the first kind and (27.9) the mini¬ 
mising property of w. If a region is not unbiassed it will be said to be biassed. 

^27-4. In certain cases there will exist among the unbiassed regions a w 0 such that 

f p (0') dx > f p (0') dx . (27.10) 

J W 0 J W 

for all admissible 0'. Such a region may be called the best unbiassed critical region and 
the test based on it the uniformly most powerful unbiassed test, or briefly the U.M.P.U. 
test. It minimises the risk of errors of the second kind among the class of unbiassed tests. 
As we shall see presently, U.M.P.U. tests do in fact exist in certain cases. 

The use of the word “ unbiassed ” in this connection is rather arbitrary and is not to 
be interpreted as meaning that biassed tests will give systematically wrong results, or that 
unbiassed tests are based on unbiassed estimators. Fortunately the different uses of 
the term “ bias ” usually occur in different contexts and confusion is infrequent. 


Unbiassed Regions of Type A \\J C. 

27.5. Following Neyman and Pearson, we now define an unbiassed critical region 
of Type A as one for which 


and 



. (27.11) 
. (27.12) 

. (27.13) 


We shall, as usual, assume that the differential coefficients exist and shall also assume that 
differentiation may be carried out under the integral sign, so that we have for all w> 

V dx = j & dx = J p' dx, say, . . . (27.14) 


and similarly for the second differential coefficient which we denote by p ". 

The first condition (27.11) controls errors of the first kind; the second makes the 
region w locally unbiassed ; the third, (27.13), implies that as 0 departs from 0 O the power 
function increases more rapidly than for any other unbiassed critical region of the same 
size. Thus in the neighbourhood of 0 O the test may be said to be better than others of the 
unbiassed type. It may not be better for larger values of | 0 — 0 O | 9 but the Type A tests 
are based on the supposition that it is more important to detect small errors of t he second 
kind than to minimise the risk of large errors, whioh will prob ably be detected in any case. 


27.6. The regions of Type A may be found by the use of the following theorem : 
the region w 0 is an unbiassed critical region of Type A if, within w 0 , 

p"(0o)>k 1 p'(0 o )+k i p(0 o ), .... (27.15) 

and outside w Q , 

p" (do) < k lP ' (0 O ) + k 2 p (0 O ), .... (27.16) 


where p’ (do) = J etc., 

and Jfc lf k 2 are chosen so as to satisfy (27.12) and (27.13). 
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Suppose that F 0 . . . F m are functions of x x ... x n and that 

I Fjdx = c } , a constant. .... (27.17) 

Jt 0 

Let W| be a region such that inside it 

m 

F^gkjF, .(27.18) 

i 

and outside it 

F % <Zk i F j .(27.19) 

where the k *s are constants chosen so as to satisfy (27.17). Then for any w for which 
(27.17) is valid 

f F t dx < f F a dx. . . . . . (27.20) 

JlO J W 0 

In fact, let mv 0 be the common part, if any, of w and w a . As both w and w 0 f&tisfy (27.17), 
we have 

f F f dx = f F t dx .(27.21) 

J w—ww 9 J W 0 —WWt 

Now f F t dx — f F 0 dx — f F t dx — f F a dx 

J w 0 J to J 10,-1010, J to—1010, 

> f S [kj Fj) dx — f Z (kj Fj) dx 

J to,—1010, J tii—uno, 

> 0 , 

in virtue of (27.21). 

In our present case take F 0 as p" (0 O ) and F„ F 3 as p' (0 O ), p (0») respectively. Then 
(27.20) is true, and hence (27.13) is satisfied if (27.18) and (27.19) are true ; and these will 
be found to reduce to conditions (27.15) and (27.16). The theorem follows. 

! 

27.7. If (27.14) holds, and if there exists a sufficient estimator t for 0, then the 
Type A region is bounded by surfaces of constant t. For then we have 

P (0) = Pi (t, 6)Pi(x) .(27.22) 

and hence, from (27.15), on substitution. 

Pi (^> ®«) ^ Pi (U ®o) + &a Pi (f> ®») 

within u>„ and conversely outside it. The equality must hold on the boundary, which 
is equivalent to the theorem. 


27.8. Writing 

* - [aV ogi, ] e _,. (27 ' 23) 

. (27 - 24) 

we have 

F'(0.) = ^(0.) 

2>"<0.) = (f + <t>*)p(0 o ) 
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and hence the inequality (27.15) reduces to 

<f>' + <f>* > k t <f> + .(27.25) 

within w B , wherever p (d B ) does not vanish ; and conversely outside w 0 . 

We may distinguish three special cases:— 

(o) If <£' is a function of <f>, say F (<j>), we have— 

F (ft + ,/>* > k t </> + kt, .(27.20) 

and the Type A region is bounded by the surfaces 

4> t — c, t and j = 1 ... m, . . . . (27.27) 

where m is the number of roots of (27.26). In this case, as we saw in 17.30, there exists 
a sufficient estimator. It follows that w 0 is defined by inequalities of the type 

c t < </> < c s , 

and we may, as in 26.24, use the ^’s as new co-ordinates and calculate the size of a region 
from their distribution functions. 

(6) As a simple case of (a), if 

f = 4 4- B<f> .(27/28) 

we find, for (27.26), 

«£ 2 - k„ <f> - = 0,.(27.29) 

and the limits of <f> are given by the two roots of this quadratic. 

(c) If <f>' cannot be expressed as a function of <f> which does not involve the x’a explicitly, 
we shall have 

<f>' > + it, <£ - <f>* .(27.30) 

In this case, considering <f> and <f>' as two co-ordinates of a point in a plane, we see that 
the region for which (27.30) is true is the one “ above ” the parabola <f>' = k t + 4> — <f>*, 

and that Jc lt are determined by 

f d<f> f p (<j>, = 1 - a .... (27.31) 

f* $ ihb r p (f V) d<f>' - 0.(27.32) 

J -« J *' 

In this instance we can reduce the problem to two dimensions by using two new co-ordinates 

4>, 4>'. 


Example 27.1 

Consider the normal distribution 

dF = — exp {— i (x — p) 2 } dx. 

To apply the foregoing theory with complete rigour we have to show that (27.14) is true. 
We ahn.11 assume that this is so, referring the reader for a formal proof to Neyman and 
Pearson (1936). 

We have, then, with 0 = p, 

log p (p) = — i n log (2 n) -iZ(x-p)* 

<f> =Z(x — //,), = — n, 

and hence this case reduces to that of (27.28). We write 

<£ = n (x - po), 
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and can clearly use x instead of <f> as a co-ordinate, which confirms the result of 27.7 since 
x is sufficient for p. 

It follows that the unbiassed region of Type A is given by 

% X < X u X > x t 

f*. 

where I p (x)dx = a 

r* 

and I p (x) (x —■ p) dx = 0. 

J*, 

Now if H 0 is true, that is if p = p 09 x is distributed in the form 

Hence x t = — x 2 and the Type A region is defined as being outside the range 

where A is given by 

f vfer"**-*' 1 -' 1 - 

In this case the Type A test leads to the usual test based on equal tail areas. The 
same test follows from the likelihood ratio, as the reader can verify for himself. 


Example 27.2 

If the distribution is normal with zero mean and variance a 2 , and H 0 is that a = cr®, 
we find 

^ = =:r(»-»>’ »y- 

<7o J <*o 

This also satisfies (27.28), and the Type A region will be defined by 

v 2 <v=~2x 2 , or v < v l9 
*0 
r»« 

where I p (v)dv = a 

J v t 

rv. 

and I p (v) (t> — n)dv = 0. 

J r. 

Here p (t;), the frequency function of the second moment, is 

P^=^T(^) vi{n ~ 2)e ~ ivdv ’ 

and we find, for the second equation, 

f t?* n dv — n f v* 71-1 du = 0. 

J ?! J ®! 

Integrating the first member by parts, v being one part, we are left with / 
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This has to be solved in conjunction with 

f V » 1 

l v ^ = - v*( n ~ 2 ) er* v dv = a. 

J Vi 2* w r (\n) 

The numerical solution can be carried out by successive approximation or graphically. 

In this connection Fig. 27.2 is of interest. It shows, for samples of two and a = 0*98, 
the graphs of the power function for the ordinary test with equal tail areas, in addition to 
the power functions for the Type A test, the U.M.P. test with a > or 0 and the U.M.P. test 
with a < <r 0 . 

Evidently, for a> a 0 the best critical region (2) has the greatest power (as it must 
have), and for a < a 0 the best region (1) has the greatest power. The test based on equal 



Fig. 27.2.—Power Curves of Four Different Tests of the Variance in Normal Samples of 2 (see text). 


tail areas has a greater power than the Type A test for a > cr 0 but a lower power for a < o 0 , 
besides being biassed, as we have seen. 

As n becomes larger the same effects persist, but the Type A and the “ equal tails ” 
tests become closer together in power. For samples of 20 or more there seems to be no 
serious loss in using the latter since the range of bias and its magnitude are then very small. 
If, of course, we knew in practice that a > <r 0 we should use the U.M.P. test, and cases may 
arise, even when such knowledge is lacking, where “ one-sided ” hypotheses of this kind 
are all that concern us. 

Invariance Theorem for Type A Regions 

27.9. It is important to show that the regions selected on the basis of Type A criteria 
conform to corresponding criteria if some other function £ (0) is used instead of 6 itself. 
In Example 27.2, for instance, where we took 0 to be the standard deviation cr, should we 
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have obtained the same regions if we had taken 0 to be the variance a* ? The answer is 
affirmative under certain general conditions, as we should expect from the relationship 
with sufficient estimators. 

Suppose we have a new parameter £, given by 

0=0. + /(£) = Y>(0.(27.33) 

where /(0) = 0. Then if p (ip) satisfies (27.14) and the similar equation in second differen¬ 
tials, if y is monotonically increasing and £ > 0, thpi the region based on £ is an 

unbiassed critical region if that based on 6 is so. It is sufficient to show that (27.15) 
and (27.16) are satisfied for £. Now 

0 = v> (0, y> (0) == 0», [|z] 0 , 

Thus 

Pc (E I 0.) = Pc ( E | y (0)) 

= Pe (E | 0») ip', 

and ^ (E | ip (0)) = p e (E | 0,) y'* +p' e (E\ 0 O ) ip". 

Solving these for p' e and pi and substituting in (27.15) and (27.16), we find 

p t (E | y (0)) > W p t (E | y (0)) + V p c (E \ ip (0)) . . . (27.36) 

within w and the contrary outside, where 


[3W 


(say). (27.34) 


The result follows. 


k1 ~ 


= fc, y'*. 


(27.36) 


Regions of Type A x 

27.10. The regions of Type A are determined so that tests based on them are 
U.M.P.U. in the neighbourhood of 0 O . We now consider a region, said to be of Type A x , 
which is U.M.P.U. everywhere, i.e. which obeys (27.11) and (27.12) but has, in place of 
(27.13), 

f p dx > [ p dx . . . . (27.37) 

Jtu» J w 

for every admissible 0 and every w satisfying the other two conditions. 

It is conceivable that (27.37) does not entail the existence of a U.M.P.U. test, for there 

might be an unbiassed region of size 1 — a for which the derivative of J p dx did not exist 

at 0=0. but which nevertheless gave a more powerful test. This refinement, however, 
need not detain us. 

27.11. If W + represents the sample-space where the density is not zero, if 

= A + B+, 

and if <f> (0») does not vanish identically in W + then the unbiassed critioal region of Type A 
is necessarily of Type A x . 

Let w 0 be the Type A region, which is determined ex hypothec by two numbers c t 
and c„ such that— 


c x < <f> 0 < c t outside u>«. 
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We have to show that 


I pdx > 1 pdx 
J W 9 J w 

for all admissible 0 and any w for which 

I pdx = 1 — a, 
J W 


with the consequence that 


j p' dx = 0. . 
J w 


. (27.38) 
. (27.39) 


Since <f>' — A + B<j> we have, solving this equation as a linear differential equation 
of the first degree, 

4 = jj.4exp^-+ rjexp . .(27.40) 


The reader may verify that this is a solution, and since it contains the arbitrary constant 
T it is the most general solution. It follows that wo may write 

log p = P(0) + TQ(0) + f(x), say, .... (27.41) 

where P and Q do not depend upon x. We then have—primes denoting differentiation with 
respect to 0 and the suffix 0 relating to 0 O — 

<£o = Po + m;.(27.42) 

We note that Q' 0 cannot be zero, for if it were we should have 


0 = J <t> 0 p Q dx = P 0 J p 0 dx = P 0 , 

which would imply that <f> 0 was identically zero. 

In virtue of the lemma of 27.6, the proposition will be proved if we can show that 
for fixed 0 and 0 O there are two numbers a and b, depending on 0 and 0 O but not on the 
#’s, such that 

p > p 0 {a </)o + b) inside w 0 (27.43) 

and the contrary outside w Q . Putting the values of p and <j> 0 in this expression, we have 
to show that a and b can be found such that, inside w 0 , 

exp{ P (0) + TQ (fl) +/(x) } > exp{ P (0 o ) + TQ (0,) +/(*)} {aP 0 + aTQ 0 + 6} 
or, writing r = P (0) — P (0 O ), q — Q (0) — Q (0«). such that 

exp (r + qT) > aQ' 0 T + aP n + b 

> Oi T + by, say.(27.44) 

Here q cannot be zero, for if it were Q (0) would be equal to Q (0 O ) and, integrating the 
frequency functions over W, we should find r — 0. The alternative hypothesis would 
not then differ essentially from H 0 . 

Consider at the outset the case when c x and c, are different. From (27.42) we see 
that <f> 9 depends only on T so far as variation in x is concerned, and that 


if <f> o == Cy 

T = Cl - Ty (say) 

Vo 

. (27.45) 

if <f> 9 = c t 

T= c *- Pi =T, (say). 

Vo 

. (27.46) 
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T 1 and 2P, are different. Choose a x and b x so as to satisfy 


a x T x + b x = e r+tT ‘ 1 
a x T t + b x = e« ,+ « T */ - 


(27.47) 


Then (27.44) is satisfied at the boundary points and we have merely to prove that 


c l <<f> 0 < c t implies e r+tT <a x T + b x 
<f> 0 < c x and <f> o > c, imply e r+aT > a x T + 6, 

This follows from the fact that 

y = e r+qT — a x T—b x 

has only one minimum , between T x and T x , as may be seen by differentiating it twice, for 
the second derivative is positive and hence the first is a monotonically increasing function. 
But y vanishes at T x and T t and hence is negative between those values and positive 
outside them. 

Finally, if c, and c t are equal, say to c, we choose a x and b x so as to satisfy 




. (27.48) 


Pq + QqTq — e\ 

qe r+qT ‘-a x = OV.(27.49) 

e r+qT> — a x T 0 — b x — OJ 

It will be found that y has a minimum at T — T„ and vanishes there. It follows that in 
the region w„ complementary to w t , where 0 O = c, we have 

€+& — a x T + b x , 

and thus in w B where <f> 0 < c or c < <f>» the left-hand side must be less than the right- 
hand side. The demonstration is complete. 


Example 27.3 

Consider again the data of Example 27.2. We have already seen that for this dis¬ 
tribution <f>' = A<f> -f B, so that the regions of Type A are also of Type A t . Among 
unbiassed tests of the hypothesis this is the uniformly most powerful test. 

Composite Hypotheses : Regions of Type B 

27.12. We now consider the extension of the foregoing results to the case when 
H 0 is composite. For simplicity we will suppose that there are two parameters 0 X and 0 t , 
H* specifying 6 X as say 0 1O and leaving 0, undetermined. Then a region w 0 will be said 
to be of Type B if 

(а) f p (On, 0*) dx s= 1 — a for all admissible 0*; . . . (27.60) 

(б) | p (0 X , 0,) dx may be differentiated twice with respect to 0, under the integral 

Ju7 t 

sign; 

(C) r^- f P (01. 0.) dx] = 0. ..(27.61) 

LM’iJm, J 0,-01, 

(d) For any other region w satisfying (27.60), 

[niL, p > [30!L 9 


. (27.62) 
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These conditions are obvious generalisations of those defining Type A. Putting now 

•ft = g^ log P ? = 1,2 . . . .(27.63) 

= <f>kj, k = l, 2 .(27.64) 

we state that the Type B region will exist and may be found if <f> x and <f> 2 are algebraically 
independent, if 

<f>n — A 0 + A x <f> x + A 2 

^12 = Bo + JSi <f> x + B 2 <f> 2 \ . . . . (27.55) 

<f> 22 = C 0 + 0 2 (f> 2 J 

and if the law of distribution of <f> 2 is uniquely determined by its moments. We omit the 
proof of this theorem, for which see Neyman (19356). 

Simple Hypotheses with Two Parameters : Regions of Type C 

27.13. The extension of the foregoing theory to the case of a simple hypothesis 
specifying several parameters presents some new features. Again to simplify the discussion 
we shall consider two parameters, 0 X and 0 2 . 

Consider the power function in the neighbourhood of 0 X == 0 2 = 0 which we will suppose 
to be the values specified by H 0 . Writing for the function 



0» I w) = I p (0 lf 0 2 ) dx 

J W 

• 

. (27.56) 

1 - 1 

1 _ 1 

= ft. 1 « 1. 2 

a, = 0,-0 

• 

. (27.57) 

' d* p ' 

= Pjk* 7 , fc •— 1, 2 . 


. (27.58) 

_ 30, dd k _ 


we have, assuming an expansion by Taylor’s theorem, 



P (0i, 021 w) - p (0, 0 | w) f- 0 X p x (w) + 0 2 p 2 (w) 

+ i {®i P 11 ( w ) + 20i 02 P 12 (w) 4* 0^ P 22 ( w ) } + • • • 


. (27.59) 

To extend the idea of unbiassed tests to such a case we require in the first place 



ft (w) = 01 
ft M = 0 /' 

• 

. (27.60) 

Secondly, there will be a minimum at 0! = 0 2 = 0 if 



A 

— P 12 Pll P 22 <; ^ 0 • 

• 

. (27.61) 

and 

Plli p2Z > 0. . 

. 

. (27.62) 


If these conditions are satisfied the power function for small values of 0 X and 0, is effectively 
p (0i, 0 2 1 w) = 1 — a + J {0? fin + 20 t 0 2 p 12 + 0| P 22 } • • (27.63) 

We may represent this diagrammatically ais in Fig. 27.3, which shows one of the ellipses 
for which the power function is constant. 

Since the hypothesis H 0 is that 0 X = 0 2 = 0, we may speak of the value Q x as the “ error 
in Oi ”, and similarly for 0 a ; and if, as in the case depicted, the co-ordinate axes are not 
the same as the principal axes of the ellipse it is clear that for values of 0 X which are not 
zero, errors of positive and negative sign in 0 2 are not equal. From this viewpoint it may 
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be said that the minimisation of the power function does not control positive or negative 
errors to the same extent; for the points A and B in Fig. 27.3 lie on the ellipse of constant 



Fig. 27.3. —Ellipse of Constant Power for Simple Hypothesis with Two Parameters (see text). 

j8, so that the probability of detecting them is the same, though A represents a positive 
" error ” in 0 a greater than the negative “ error ” given by B. 

27.14. Whether this is a desirable property of the test depends to some extent on 
what the test is intended to do. To avoid the anomaly we must require that 

Pit = 0.(27.04) 

Furthermore, even if this condition is satisfied and the principal axes of the ellipse coincide 
with the co-ordinate axes, there may still appear anomalies if the length of one axis is greater 
than that of the other; for then errors in one parameter are not detected as frequently 
as errors of the same size in the other. Here again it is a matter of particular circumstance 
whether such an effect is regarded as objectionable. (We disregard the fact that it can 
be removed by appropriate scaling of the parameters, which may or may not be artificial.) 
To remove it we must require that 

Pit = /?.«, .(27.65) 

so that the ellipses reduce to circles. 

We may refer to the ellipses as “ curves of equidetectability.” 

27.15. With the foregoing explanation in mind we define w 0 as a regular unbiassed 
critical region of Type C if it obeys the conditions 

Pi (Wo) = Pi (Wo) = o.(27.00) 

Pit K) - 0 .(27.07) 

Pn ( w ’o) = Ptt (w\>) ..... (27.68) 
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and if, for any other region obeying these three conditions and for which 



P (0, 0 I w 0 ) = fi (0, 0 I w) = 1 - a, . . ‘ . 

. (27.69) 

we have 


Pu (Wo) > Pu (w). .... 

. (27.70) 

Secondly, if 1 

a region w x possesses the property that 



Pi (Wi) = (w x ) = 0 

. (27.71) 


Pl2 (^l) Pll (Wl) P 22 (V>l) <0 

. (27.72) 

and for any other region obeying the conditions 


\ 

0 (0, 0 I to,) = 0 (0, 0 I to) = 1 - a 

. (27.73) 


011 (Wl) _ 012 (w,) _ 022 («>l) 

Pu (w) Pi 2 (W>) Pit (w) 

. (27.74) 

we have 

Pu (U’l) > Pu (w) ... . 

. (27.76) 

we shall say that 

w x is a non-regular unbiassed critical region of Type C. 



These equations are analytical ways of saying that the regular region of Type C is 
the one, among all regions having circular curves of equidetectability, which has the smallest 
radius for any given value of the power function ; whereas the non-regular region of Type C 
is the one, among all regions having similar ellipses of equidetectability, which has the 
smallest axes. 


27.16. We now state without proof theorems similar to those demonstrated above 
for the case of a single parameter. 

Write 



Then w 0 is a regular unbiassed critical region of Type C if 

(а) inside w 0 

Pn > K (pu — P 22 ) + h V 12 + h Pi + *4 P 2 + ^*5 p, 
and outside w Q the inequality is reversed— 

(б) I p j dx = I P 12 dx — I (p n — P 22 ) dx = 0, j — 1, 2, 

J W 0 J V 0 J V 0 


(27.76) 

(27.77) 


Secondly, if w x satisfies the conditions— 

(а) that inside to, 

Pu > (yi* Pu — Yu P^) + (Yu Pu ~ Yu Pu) + Pi + &« Pi + P (27.78) 
and outside to, the inequality is reversed, the k’s as usual being constants and the y’s obeying 
the conditions 

Yu > 0, Yin ~ Yu Yu < 0 ; 

(б) f p t dx = [ (y,« J>„ — Yu Pit) dx — [ (y M Pn — Yu Pu) dx = 0, (27.79) 

J w, J«i 

then to, is a non-regular unbiassed critical region of Type C, having ellipses of equidetecta¬ 
bility determined by 

Yu 0f + 2y„ 0, 0 2 + Y ‘22 = constant. 


. (27.80) 
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27.17. The theorem of invariance of 27.9 no longer holds in general for the present 
case. If we transform to new parameters and the equations of transformation 

d U = —dd, + d ^dQ t , 

0fj\ (A/2 

etc. will not transform an ellipse co-axial with the co-ordinate axes Q ly 0* into one co-axial 
with Ci, Cv Thus, in general, the effect of a transformation is to make a regular Type C 
region into a non-regular Type C region. 


27.18. As usual, the conditions for the Type C region may be simply written in terms 
of the derivatives of logp. Write 


Then if 
we shall have 


fa = r a|- lo 8^1 

_ r^iogpi 

* lk L M k Ja 1 . 9 ,» 0 


fak — fak + B jk <j> x -f C jk <f> t 


. (27.81) 
. (27.82) 

. (27.83) 


Pjk — (fa fa + fak + Bjk + C ik <f> t )p . . . (27.84) 

and the inequality (27.76) becomes 

(1 &i) </>i <f>i <j>t "f hi fa — is <f> i — <f> 2 — is ^ 0 . . (27.85) 

where the k' are new constants easily expressible in terms of the old. They must be deter¬ 
mined so as to satisfy (27.77), which reduce to 



+ A xl )p dx 



{ fa ~ fa + (A 1X - A lt )}p dx = 0. (27.86) 


Example 27.4 

Suppose we have a sample of n x from a normal population with mean p, and unit 
variance and a second sample of n, from a normal population with mean p t and also unit 
variance. The simple hypothesis to be tested is /i x = = p a , where p 0 is some specified 

value. We consider two cases:— 

(i) in which errors of the same size in and p, are equally important; 

(ii) in which, for some reason, there is a stronger desire to avoid errors in than 
in p x and that therefore a greater number n t of members has been taken in the second 
sample. We also assume that the sizes of errors judged of equal importance are 
inversely proportional to \ /n, so that we are led to consider new parameters— 

Po) V n i> V* — (p» — Po) \/»« . . . (27.87) 


Case 1.—The frequency function is 


vS B dtS* "I 

p oc exp - i2j (*> ~ Vi)* ~ \ 2 j ’ 

1 n x +1 J 

It will be found that 

$1 = U\ (Xx [A o) ; <f>2 = 71% (Xi — f4>o) ] 

$11 ~ Ui ~ Any $ x 2 = 0 = A 1% ; <f> i3 = — = A2i* 
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From (27.85) we then find 

(1 — k x ) rii (Xi — p t )* — k a n x n t (x x — p t ) (x a — /i 0 ) + n‘| (x a — p 0 ) z 

— k 3 n x (x t — /«„) — k\ n 3 (x a — fi 0 ) — k 3 > 0. . (27.88) 

The law of distribution of x, and x„ may be written 

V oc exp [- \ {n x (x t - , ( „) 2 + n 3 (x a - ju 0 ) 2 } ]. . . (27.89) 

Put u — Vn l (x a - fi 0 ) and v = Vn 3 (x a — p 0 )- 

Then the region w 0 is determined by 

(1 — k x ) n l « 2 — k t uv \/(n x n t ) + k x n 3 v 2 — k' 3 uy/n x — k\ vy/n a — k 3 > 0 (27.90) 


where 


I p (u, v) du dv = 1 — a 
J W 0 

I u p (u, v) du dv — I v p ( u , v) du dv = I uv p (u, v) du dv — 0 . (27.91) 

J W’o J Ut 0 J W 0 

J (n l u 2 — n 2 v 2 ) p ( u . t 1 ) rfw dv — (1 — a) (n x ra 2 ) . . (27.92) 

J »r 0 


. (27.92) 


p (u, v) = - exp { - \ ( u 2 -|- v 2 ) }. 


It is evident from (27.90) that in the ( u , v) plane the boundary of w Q is a conic. From 
(27.91) we see that it must be coaxial with the co-ordinate axes and have its centre at the 
origin. Hence k 2 — k :t — k\ — 0. Finally from (27.92) we find that the boundary is 
of the form 


where 


v “ v £ 

a 2 + b 2 ~~ ’ * 

1 u x (1 k x ) 1 n 2 k x 

a 2 kr t ’ h 2 k'r. 


. (27.93) 


(27.94) 


The Type C regions are then defined by (27.93), but we have to express a and b in terms 
of known constants, including the probability level 1 a. Wc have to satisfv (27.92)* 
and will show that a solution always exists. 

Put 

F (a, b) = J— f ( n x u 2 — n 2 v 2 ) exp {— \ (u 2 + v 2 ) } du dv — (n l — n 2 ) (1 — a). (27.95) 
J U'o 

If the boundary of is a circle, its radius is easily found to be 

a = b — V { — 2 log (1 — a)}. 

The integral F ( a , b) outside this circle, by the substitution u = r cos y\ v = r sin tp y is 
found to be 

F (a, a) = (»i — n a ) -- f exp { — i (m 2 + v 2 ) } du dv — (w x — n s ) (1 — «) 

2 7t J ii , +r l >a a 

(1 — a) (n x - ?i 2 ) £a 2 . 

Now taking w; 0 as the space outside the parallel lines 

v == i A, 


2 f x 

which is given by a infinite, so that J e ^ 


dx = 1 — a, 


a.s.—VOL. 11. 
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F (oo, A) = — (» x - »,) (1 - a) + J 


«* exp {— £ (it 2 -f- t> 2 ) } du dv 


Similarly, 


- — | »* exp {— £ (it 2 + i> 2 ) 

— -n t e~* A * < 0. 

F (A, oo) = /— A e - ** 1 > 0. 

\ ft 


Thus, since F (a, 6) is continuous it must vanish somewhere in the range A < a < oo, 
A < 6 < oo. The values for which it does so define the Type C region. 


Case 2.—In this case, using the parameters rj l and rj t of (27.87), we find 

<f>i = u, <f) 2 = v 

(f>ll — 1 , <I>12 — 0 , <f>2i = 1 . 

The inequality becomes 

(1 — k x ) u 2 — k 2 uv + &i v 2 — k 3 u — k\ v — k' 5 > 0, 

where I (u 2 — v 2 ) p ( u , v) du dv = 0. 

J U’ c 

In a similar way it follows that the Type C region is the one lying outside the circle 

tt* + v 2 = — 2 log (1 — a). 

We leave the verification of this result to the reader. 


Certain Limiting Properties 

27.19. From the foregoing examples it will be seen that in certain cases the optimum 
critical regions are by no means easy to determine numerically ; and it is not always clear 
that the labour involved is repaid by the results. Some consideration has been given by 
various writers to tests which have optimum properties for large n, the presumption being 
that the same tests will be good, if not the best, for small values. As usual when several 
limiting processes are involved simultaneously, the rigorous enunciation and proof of 
theorems in this field is a matter of some complexity, and we shall here merely indicate 
some of the results in very general terms without including proofs. 

It has been shown by Neyman (19386) that there do exist tests which are unbiassed 
in the limit, and rules have been given for finding them. It has also been shown by Wald 
(1941a) that there exist tests which are most powerful in the limit, and that such as are 
based on maximum likelihood estimators are of this class. The tests are uniformly most 
powerful for the single parameter 6 > 0 o and for 6 < 0 o , but not both ; and for any range 
they are the moBt powerful unbiassed tests in the limit. Furthermore, the Type A test 
tends to the most powerful unbiassed form. 

The general conclusion seems to be that, even where the variation is not normal, most 
of the tests in current use which are based on likelihood estimators have optimum properties 
in the limit, and may therefore be used confidently for moderate or large samples. For 
small samples the position is not so clear, particularly for non-normal variation. Tests 
based- on inefficient estimators are presumably less satisfactory; and for the non-para- 
metric case there is as yet no complete theory. On this latter question reference may be 
made to a useful review by SchefF6 (1943). 



PITMAN'S METHOD FOR LOCATION AND SCALE PARAMETERS 323 


The Unbiassed Character of Likelihood-ratio Tests 

27.20. It is of some interest to consider how far the tests based on likelihood (26.35) 
are unbiassed. 

It has been shown (Pitman, 19396 ; Brown, 1939) that the Neyman-Pearson test in 
the problem of k samples based on X Hi is biassed unless all the samples are of the same size ; 
but that Bartlett’s modification (26.42) is unbiassed. We prove this in 27.25 below. 
On the other hand, Daly (1940) has shown that in certain multivariate tests such as those 
of regressions, multiple correlations, Hotelling’s T (which we introduce in the next chapter), 
and the ordinary analysis of variance and covariance for orthogonal or non-orthogonal 
data, the likelihood-ratio tests are unbiassed, at least in the Type A sense (i.e. locally) 
and in some cases completely so. 


Pitman's Method for Location and Scale Parameters 

27.21. In the special but not uncommon case where the hypotheses under test con¬ 
cern parameters of scale or location, a simplified approach is possible. Suppose the joint 
distribution of k sample-values is 

dF f (#i Oi, #2 02> • • • %/c Ofg) dxi . . . dx k . . , (27.96) 

We seek for a statistic J, independent of the O' s, to test the hypothesis ; and clearly, if the 
test is to be satisfactory, J must be independent of the origin, i.e. must be seminvariant. 
The test that the 0’s are all equal is then equivalent to testing the hypothesis 

0 X = 0 2 = . . . - 0 k = 0.(27.97) 

Without loss of generality we may suppose the hypothesis rejected if J is small and less 
than some quantity depending on the acceptance value a, and we may also suppose J 
positive ; for if either condition is not satisfied we can transfer to some other function of 
J for which it is. 

In the sample space W , J must be constant along the line x x = x 2 — . . . — x k = con¬ 
stant, and therefore the critical region w 0 will be the one lying outside a hypercylinder 
whose axis is parallel to this line. When //„ is true, the probability of rejection is then 

[ tf(x 1 ...a; fc ) = l- a, . . . . (27.98) 

J U7 0 

and when it is not true the probability is 



• ** - 0 k ) 


where w is merely derived from w a by a translation in W without rotation, 
parallel to x x — . . . — x k — 0, we write 


. (27.99) 

If L is any line 


P 


(L) — j" dF (ajj . . . x k ) 
= f /(*! . . . x k )dr) 


where t} — ~ £ (x); . 

and rj is thus the distance of the point (x t . . . x k ) from the plane E (x) — 0. 


. (27.100) 
. (27.101) 
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Now if w 0 is defined as the locus of all lines for which P (L) > h, a constant, P (L) will 
be less than h on any L which is in w but not in w t . Hence 

, f dF> f dF, .(27.102) 

J 100 J W 

and so the resulting test is unbiassed. Thus an unbiassed test is given by choosing J so 
that at any point of a line L it is equal to P (L) at that point. Now we may write for the 
variable co-ordinate on a particular L, say £ r , 


$ r =X r -t 


where 

t = -Z(x) - r L. 

k y ’ y/k 

Hence 



P (L) = y/k J / (x t — t, x t — t, . . . x k — t) dt. 

Taking 

J = Vk p M' 

we find 



~ j* f {X\ - ^ 0 dt f 

which gives us 

an unbiassed test. 


. (27.103) 


. (27.104) 


Example 27.5 

Consider the case where the variables are distributed normally with unit variance. 

/ - — t r exp {- \ E (x j - fy) 2 }. 

(2ji)2 

Then we have, from (27.104), 

J — —exp {- lZ(Xj — t ) 4 } dt 
(2jr)r J — 

where 8 = 2 (x — x) 2 . 


In practice we should take S as our criterion, not J, and reject the hypothesis that 
the means were unequal if 8 exceeded some fixed value determined by a. We observe 
that in fact 8 is distributed as % 2 with k — l degrees of freedom when H 0 is true, so that 
this value is easily ascertained. 


27.22. Consider now the case where the frequency function is 

J f ( x i x k\ 

M. • • ■ V \®» ’ ’ ’ oj' ’ 

If the x's are positive in range we put 

Vi = log Xf, h = log Qp 
and for the frequency function of the y’s we find 

exp (27 y - L <f>) f («*»-*>, . . . e v >~**). 


. (27.105) 
. (27.106) 


. (27.107) 
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This reduces to our first case, and we have an unbiassed criterion that 


by putting 


<f>i — <f> 2 — . . . = <f> k 

J — | exp (Zy — kt) f (e" 1- ', e v, ~‘, . . . e n ~') 


dt 


. (27.108) 


When the x’s are not necessarily positive the expression remains the same, except that in 
(27.108) n (x) becomes II (\ x |). Small values of J are significant. 


27.23. Suppose now that our hypothesis asserts the equality of 0’s or <f >*s and 
states that they have a common value 0 O or <£ 0 , as the case may be. Then if we take 


r -(M 


/(* i 


*k)> 


(27.109) 


the test will be unbiassed. Moreover, if we regard small values of J' as significant and the 
x’s are independent, and if each frequency function is unimodal, then when 


- 0 , = . . . - 0 k - Oo 

is not true the probability that J r exceeds the specified limit based on l a increases as 
any 0 tends to 0 O . J' therefore provides an unbiassed test. 


27.24. Finally, consider the case of k variates each distributed in the form typified by 

"" * --S)(sr"^ • • (27110) 


Their joint distribution is 


(IF 


, "(jr 

77 { <f> r (in) } 


. (27.111) 


Hence, to test the hypothesis that the samples have the same <f> we have 

j _ g(g> r e -x*/tjL 

~ n {r (m)}J„ e i«+»’ 


where M — Z (m), 

’ V(M) n (x m ) 

~ n {r(m)\' (Zx) M ‘ 

It is sometimes convenient to deal with 

11 (x m ) 

~ (Zx)» . 

which differs from J only by a constant factor. 

The maximum value of K is 

77 (m TO ) 

~3f ar 

and we put 

L — — r log — = M log (*£) - z(m log?- \ 
log max. K \ M ) \ m / 

L is essentially not negative, and large values are significant. 


. (27.112) 
. (27.113) 


. (27.114) 
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For testing the hypothesis that a set of variances have some specified equal value, we 
find similarly from (27.109) 

L' = Z(x) - M — i^mlog^. . . . (27.116) 

27.25. The foregoing result has an immediate application to the case of k normal 
samples, for the variances are then distributed in the Type III form of equation (27.110). 
The criterion L becomes 

L = N log ^ ^ log ~ • • -(27.116) 

where v as usual represents the number of degrees of freedom and N = E (v). This, as 
will be seen by comparison with (26.93), is equivalent to Bartlett’s test, and shows that 
it is unbiassed. 


NOTES AND REFERENCES 

For the theory of unbiassed tests see particularly Neyman and Pearson (1936 ; 1938) 
and Neyman (19356). Regions of Type B have also been considered by Scheflte (1942a), 
who discusses a Type B l standing in relation to B as Type A 1 to Type A. 

For limiting properties see Neyman (19386) and Wald (1941a). 

See also references to the previous chapter. 


EXERCISES 

27.1. Show that the test of Example 27.1 provides regions which are of Type A x 
as well as of Type A, and that the test is a U.M.P.U. one. 


27.2. Show that the cumulants of the distribution of L of (27.114) are 

k x = M {G x (M) — log M) — E[m {G x (m) — log m) ] 

K r - (- If {Em r G r (m) - M r G r (M) }, r > 1 


where 

Hence show that the cumulants of 


G '- , s-. logr(m) - 


1 +f 


are approximately 





r(r), where 


^ 6 (k — 1 ) {^ ( to ) m\ 


and thus that 


2 L 

1 +fi 


is. distributed approximately as with k — 1 degrees of freedom. 

(Bartlett, 1937c; Pitman, 19396.) 
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27.3. Show that in samples of 3 from a normal population the distribution of the 
range r is given by— 


dF 


6 


Oy/n 


fove 
e 4o> | 

Jo 


V(2n) 


e~ iv ‘ dy dr. 


Hence that an unbiassed critical region of Type A is given by 


r 



the region lying outside <,r < r 2 . 


(Neyman and Pearson, 1936.) 
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MULTIVARIATE ANALYSIS 

28.1. We have already considered some aspects of the case in which each member 
of a population is characterised by several variates x x . . . x p . For instance, we have 
examined the measurement of correlation between the variates and the regression of one 
variate on some or all of the others. In this chapter we shall extend our inquiries into 
the multivariate case a good deal further, mainly by taking into account the possibility 
that different sample-members may have emanated from different populations. This 

, will lead to some generalisations of the methods already discussed for the univariate case, 
such as tests of homogeneity anjl tests of differences between two samples. Some of our, 
^known results generalise with nothing more than additional mathematical complexity; 
but in others certain new features appear, and the theory of multivariate analysis is not 
entirely a matter of generalising univariate results to p dimensions. 

28.2. One or two examples will illustrate the kind of problem with which we are 
concerned. A number of skulls are discovered in a burial-ground. They are found to 
vary among themselves in the manner usual in biological material. Is the observed varia¬ 
tion consistent with the hypothesis that all the skulls were derived from members of the 
same race or does it suggest a mixture of racial types ? If heterogeneity is indicated, do 
the skulls fall into two well-defined categories, such as we might expect if the burial-ground 
were the site of a battle between two races such as Saxon and Celt; or are there several 
types such as we should expect in the normal burial-ground of a town where races were 
living together and interbreeding ? Or again, if the skulls are compared with another set 
known to have been buried at a much earlier time from the same race, is there any evidence 
of a significant change in skulls from one period to the other ? 

There is no single measurement on a skull which is marked out from the infinite number 
of possible measurements for deciding questions of this kind. It is quite common for 
thirty or forty measurements to be taken by craniometricians on a single skull. Even if 
we reject many of these for practical reasons, leaving out the jawbone, for instance, because 
it is often separated from the skull and cannot be identified, we shall still be left with a 
number p which require consideration. For n skulls we shall then have n sets of p values 
corresponding to variates x x ... x p which are, in general, correlated among themselves 
and may be highly so. Our problem is to test the homogeneity of these values, or to esti¬ 
mate differences between parent populations from which they were derived. We may, 
of course, apply methods which are already familiar by picking out one variate and testing 
for homogeneity. But we might pick out quite an unsuitable one and sacrifice most of the 
information. Even if time permits we cannot take each variate in turn and test it because 
the variates are correlated and our p tests are not independent. 

28.3. Again, suppose we have two different breeds of laying hen and are given a 
batch of eggs from the hen-run without knowing which hen laid which egg. We require 
to allocate the eggs to the two breeds. Assuming that there is no decisive criterion such 
as colour of shell, we may measure various properties of the eggs such as length, breadth, 
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weight, volume, specific gravity and so on. Some of these measurements will be highly 
correlated or, in the extreme case, perfectly correlated, as with weight, volume and specific 
gravity. In such circumstances we may reject some variates as redundant; but in general 
we shall be left with several sets of measurements. Our problem is to find some method 
based on the retained variates for allocating the eggs to the correct parent breed. In 
particular we might search for the best linear function of the variates to discriminate between 
breeds and to enable us to assign the eggs with the maximum probability of correctness. 

28.4. Throughout the whole chapter we shall, except when the contrary is stated, 
assume that the variation is normal. In addition, to render our formulae a little less 
cumbrous we shall borrow a summation convention from the tensor calculus. If the 
affixes i, j range from 1 to p we shall write 

Aiia u=i Aija v’ .( 28 - i > 

i.=i j i 

the affixes to A being regarded as ordinary superscripts, not as powers. Similarly we 
shall have 

A>i = £ A ‘ S a ik .(28.2) 

i-1 

Whenever an affix occurs as a superscript and a subscript, summation is to be understood. 
Clearly the actual letter used is a dummy and we have, for instance, 

A ij a {j r- A kj a kj - A kl a kl .(28.3) 

We shall write the array of values A iJ (a square matrix) as (A' j ) and its determinant 
as | A ij | or simply as | A |. 

To every matrix (a }J ) with a non-vanishing determinant there corresponds a reciprocal 
or inverse matrix which we may write ( a lj ). Since 

(«..) (a ij ) - 1, 

we have, on carrying out the multiplication, 

a iS a ik =- 1, j = k 

— 0, j 9^- k, 

which we may express as 

a {j a ik = a kj d k .(28.4) 

where d k 9 one form of the Kronecker delta, is zero if j ^ k and unity otherwise. The quan¬ 
tity a ij is the minor of a fj in | A | divided by | A | itself. 

28.5. It will further simplify our formulae and will give rise to no loss of generality 
if we suppose our variates to be in standard measure, that is to say, to have zero mean 
and unit variance. If we require results for the more general case we can easily obtain 
them from transformations of the type 

x i = a i + m i .(28.5) 

With this convention the equation of the multivariate normal distribution (cf. 15.12, 
vol. I, p. 376) may be written 

dF = exp (- \A*i x t x } ) dx x . . . dx p . 


. ( 28 . 6 ) 
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where the A’ s are related to the correlation determinant 

A = \ Pii \. . . . '. . . ( 28 . 7 ) 

In fact (A ij ) is reciprocal to (p (i ), as we saw in 15.12. 

28.6. We shall also frequently refer to the matrix of sample variances and covariances 
which we shall call the dispersion matrix and write as (a a ), where 

l n 

a ti = - y\ (*i - x t ) (x } - Xj). . . ‘ . . ( 28 . 8 ) 

uZi 

This, it is to be remembered, is in standard measure for the population, that is to say the 
observed variates are taken from the parent means and divided by the parent standard 
deviations. 


WishdrVs Distribution 

28.7. We now proceed to generalise to p variates the joint distribution of dispersions 
arrived at in 14.12 (vol. I, p. 339) for the bivariate case ; and we shall also show that 
the distribution is independent of that of means. The result and method of proof are 
due to Wishart (1928). 

First of all let us write the result for the bivariate case in our new notation. For 
the distribution of means we have 


dF = *1*1 exp ^ ^ A iJ x { X] ^ dx x dx 2 


i,j = 1 , 2 


(28.9) 


and for that of dispersions 


= ( ? v -1 1 a i« n_i > _ i ° _, 




dcL\\ dd x j (28.10) 


For instance, we have 


dll = d x2 — V 8 X 8 Zi d 2 2 — 

/I -p \ 


(A«) = 


so that (28.10) is equivalent to 


dF = 




j —P 

1 - P 2 1 - 
~P 1 

1 -p* 1 ~p* 


(1 - r 2 )V 4 a 

(i_p*)i(»-i) 


X exp | W ~ 2prSl St + a *) | * 1 ds * dr - 


This, with the substitution 


is the form found in equation (14.44), vol. I, p. 342, when it is remembered that we are 
working in standard measure. 
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28.8. Now consider the general case. With a sample of n values of p variates we 
consider p rectangular spaces of n dimensions each as the domain of variation. If a point 
in one of these spaces be fixed, the variation in the other spaces is constrained for fixed 
values of the sample dispersions. The following argument is a generalisation of that given 
in 14*12 leading to the bivariate result, and the reader may like to refresh his memory 
by re-reading that section. 

Writing x jX . . . x in for the n values of the jth variate, we have for the density function 
of the whole sample, from (28.6), 

/= w^ exp { _ *2J ***,*>} 

= exp[- \E {AV (pc ik - x { ) (x Jk - x } ) }] X exp ^ | A<> x t x^j. (28.11) 

We may thus factorise the density function into two parts, 

■ ' ' • (2812) 

• • ■ < 2813 > 

where we have chosen the constant factor of f x so that the distribution shall have the total 
frequency unity. 

n 

Consider now the volume element 77 dx lk dx 2k . . . dx pk . In any particular w-space 

ifc-i 

the density is constant over hyperspheres centred at the mean. The volume element may 
then be represented as the product of elements dxj and of independent elements depending 
on dispersions. In the total space of pn dimensions the volume element may thus be 
represented as the product of p elements dxj and an independent element depending on 
dispersions. Thus the volume element also factorises, and we have immediately for the 
distribution of means 

dF = 6XP ( ~ 1 A ° £i **) ,5 ‘ * ‘ (2814) 

showing that the means are distributed in the multivariate normal form independently 
of dispersions. 

If we define a matrix (B) with elements \n times those of (A), we may write the dis¬ 
tribution of means in the simple form 

dF = ■ ^ exp (— B ij x t x s ) II dx .(28.15) 

7l* V 

We note that this checks with the known results for p = 1 and p = 2. It is also seen 
almost at once that the variance of Xj is Oj/n, as we expect. 

28.9* We have now to consider the more complicated expression for the volume 
element of dispersions. Let us in the first instance transfer our origins to the sample means, 
remembering that in doing so we have lost one dimension (or degree of freedom) in the 
variation of our sample-points. Let P x . . . P p be the sample-points whose co-ordinates 
are the n values of x x . . . x pl one point P lying in each w-space. -We shall consider in 
turn the variation of Pi, then that of P a for fixed P l9 then that of P 3 for fixed P 1 and P a , 
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and so on. The total variation will be given by multiplying the various expressions so 
obtained; and it will be sufficient if we consider the typical case of the variation of P m 
for m — 1 fixed points P x . . . P m -i* 

For a fixed length OP m and fixed angles with OP x . . . OP m-1 , P m can vary on a 
hypersphere of n — m dimensions ; for, if we fix any particular angle, P m is constrained 
to lie on a hypercone which cuts its hypersphere of variation in a hypersphere of one fewer 
dimensions, and the fixation of the origin at the sample mean imposes a further constraint. 
Further, if we regard the p spaces as superposed, as we may, the centre of this (n — m)- 
dimensional hypersphere is the foot of the perpendicular from P m on to the space containing 
the points, 0, P x . . . P m _ x . Call the length of this perpendicular for the time being r m . 

The volume of a ^-dimensional hypersphere of radius r is 

. n* k r* 

7 (*¥) 


and its surface area, obtained by differentiating with respect to r, is 

2 n* k r* _ 1 

“ r (P). 

The surface area of the hypersphere of variation of P m is thus 




. (28.16) 


. (28.17) 


To find the element of volume due to the variation of P m and the angles which 0P m 
makes with 0P X . . . 0P m _ x we have to multiply (28.17) by an element of variation 
normal to the hypersphere of n — m dimensions. This variation lies in the hyperplane 
determined by the origin and P x . . . P m which is, in fact, normal to the hvpersphere. 
To evaluate it, consider the transformation 


m 

£m) = £ Xmk Xjk’ j = 1 . . . m, . . . (28.18) 

1 

where, of course, the x'a are measured from the sample means in virtue of our choice of 
origin. We have for the Jacobian— 



d(S m i ■ 

• • £mm) 




• • %mm) 



X xx 

X 1Z . . 

■ X 1 m 

=ES 

x l2 

X 2 % • • 

• x 2?n 


2 *lm 

2x 2m . . 

• 2x mm 

“ 2v m> 

• 

- 


. (28.19) 


where v m is the volume (or “ content ”) of the hyperparallelopiped having one corner at 
the origin and edges running to the points P x . . P m . Furthermore, 


I $mj I = I X mk X jk | 
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The required element is thus 


2v, 


rn 


m fc-1 


and the total element of variation of P m , on multiplication by (28.17), is 

jfkin-m) r n-m —1 m 

n ds nlk . 


r( n - f?)-..*- 


(28.21) 


Now r m is the length of the perpendicular from P m on to the space OP 1 . . . P m ~i 
and is therefore equal to v rn /v m _ l . Hence, for the variation of P m we have the element 

-m) q.n-m -2 m 

n d£ mk .(28.22) 




We now derive the total element for variation of P x . . . P m by multiplying expressions 
of type (28.22) for m = 1, 2, . . . p. The terms in v cancel except v p and v 0 , the latter 
being unity, and we find 


_|p(2n-j»—1) m p 

p / . l T v ~ 2 n n di- jk . 

jj p / W j=l 


Now from (28.18) we have 


(28.23) 


£jk — n a jk •••••• (28.24) 

and from (28.20) v* = wP | a |.(28.25) 

Making the necessary substitutions in (28.23) and adjoining the frequency element given 
by (28.13) we find, after a little reduction, 


a | 2 ) 

p / n _ exp f — - ^ a i} \ II da. . (28.26) 

^P(P-I)^/ p(-- 2 ~) V 2 V v 7 

This is Wishart’s generalisation of the distribution of dispersions in a multivariate 
normal system. The reader who feels that the foregoing proof demands too much of his 
powers of geometrical insight may refer to alternative derivations by Wishart and Bartlett 
(1933c) or P. L. Hsu (1939a). The domain of variation of the a’s is 0 to oo for a u and 
corresponding values for a iJy i j, such that correlations do not exceed unity in absolute 
value. 


dF = 


_(:) 


n \ Ap(«-I) 


A I *(»-D 


28.10. It must be remembered that we are regarding a tj as the same as and that 
the product of differential elements in (28.26) contains \p (p -j- 1) items, not p 2 ; for there 
are p elements of the form da u and \p (p — 1) of the form da ip i ^ j. The expanded form 
of A ij a,j, however, takes place over i, j from 1 to p, so that any particular term such as 
A 9 * a 84 occurs twice, once as A 34 a 34 and once as A 43 a 43 ; except that when i = j the term 
occurs once. For instance, with p = 2 we have 

A iJ a tj = A 11 a tl + 2A 12 a lt + A 22 a 2i . . . . (28.27) 

We can now derive the characteristic function of the Wishart distribution. Ignoring 
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constant factors and writing a single integral sign for summation over all <Zy, we have, 
from (28.26)— 

J J a | *<—*-” exp (- | A‘i a^nda = | ^ * n - _ f) . . (28.28) 

where K is some constant. In this form let us replace A ij by A ij — -0 ij when i and 

TV 

2 

by A iJ — -0^ when i = j. Then the resulting integral is the characteristic function of 
the a’s, Q ij being the parameter it iJ corresponding to a ti . We thus have 

^ (0«) = . LrJ._ 

I A 11 — - 2 0 11 A 12 — - 0 12 . . . A'» — -d lp i 


■ A 12 _ iflia A 22 — ?0 22 . . . A 2p — -0 2 * 1 
n n n 1 


(28.29) 


I A>v — I fli p A 2p --6 2i > . . . A pp — - 0 vp j 

n n n I 

the constant being evaluated by the consideration that <f>( 0) = 1. 

Example 28.1 

Let us apply these results to an examination of the moments of the distribution of 
covariance in the bivariate case. We have 


A 11 = A 22 = —, A* 2 = -— P- 

1 — p 2 1 — p 2 

Wc then find for the c. f. of a lu a lz , a 22 — 

A oc 

1 — p 2 n 1 — p 2 n 

-p _ J_ 20 22 

1 — p 2 n 1 — p 2 n 

We are interested only in the parameter 0 12 which we will write as 0, putting the others 
equal to zero. We then find-—- 

, r 1 f -P 


-*(»-!) 


| J _ 2p0 _ ( 1 -p 2 ) 0 2 j-X"-D 


Taking logarithms and evaluating coefficients of powers of 0, we find for the cumulants 

n - I 


= ~~ (! + P 2 ) 

n* 


2 (» - 1) 


p(3+p 2 ) 


~:T-- (! + V + P 4 )- 
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In standard measure the distribution tends to normality as n tends to infinity. But for 
finite n we have 


0i = 


4 p 2 (3 + p 2 ) 2 


n — l 


0a = 3 + 


(1 + P 2 ) 3 

6 1 + 6p 2 + p 4 


-1 U+P 2 ) 8 

Thus, even when p = 0 our distribution^ though symmetrical, is not normal. 
Wishart (1928) has given formulae as far as those of the fourth order for eight or 
fewer variates. 


Hotelling's Distribution 


28.11. In the univariate case we can test the significance of a mean by comparing 
it with the estimated standard deviation, the ratio being distributed in “ Student’s ” form 
(or some simple transformation of it if we compare the mean with the actual sample variance 
and not the unbiassed estimator). We proceed to generalise this result. 

We require a single quantity which will serve as a measure of departure of all the means 
Xj from the population values which, as usual, we take to be zero. In place of the matrix 
of dispersions, we shall consider the matrix of sums of squares and products (by) where 

n 

by = (x ik — x { ) (Xj k — Xj). .... (28.30) 

As usual we take (b iJ ) to be the matrix inverse to (by). Let us now write 

T* = n(n - 1) b ij x { Xj .(28.31) 

This is Hotelling’s generalisation of the “ Student ” ratio t. 

In the simplest case when p = 1 we have 

bn = ns 2 


and hence 



J12 


n 


. (28.32) 


so that T becomes equal to the ratio t as required. 


28.12. We have 

= n Vi x { x } . (28.33) 

Let us now denote by the sum of squares or products about the origin, so that 

m tj — b tj + nXfXj . (28.34) 

The determinant of may be written 


1 

X x y/n 

X 2 V 71 • • 

x p V n 

0 

6,1 + nx f 

b \ j 4" nxyX 2 • • 

6, p + nx x x P 

0 

6„ + nx t x i 

6 ,j + nx \ 

b 2p + nxtx p 

0 

6 lp + nx p x 1 

b 2p + nx p x t . . 

b pp + nx p 
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On subtracting x x y/n times the first row from the second, and so on, we find— 


I m u I — ■ 1 

I — X x y/n 


X x y/n . . . X p y/n ; 
fen . . . bip | 


j *^p ^ fel J) • • ■ fejap 

and on expanding according to the border row and column, 


I m a I “ I bij | + nbVxjXj | by |.(28.35) 

It follows that 

772 

I I ^~Z T ~X ^ I I I i 


or 


1_ _ 1 & I 

1 ** r 


(28.36) 


This is a fundamental equation in the sampling theory of T and we proceed to interpret 
it geometrically. 


28.13. In the case p = 1 we have a single sample space of n dimensions. The numera¬ 
tor and denominator of (28.36) then reduce to b xx and m lx —that is to say, the squares of 
distances from the sample-point P x to its projection on the unit vector whose direction 
Cosines are all equal, and from P x to the origin, respectively. The ratio of (2g.36) has 
zero dimensions and is in fact the square of the sine of the angle between OP x and the unit 
vector. This is the geometrical approach which gave us “ Student’s ” distribution in 
Example 10.6 (vol. I, p. 239). 

In the general case let us regard the p w-spaces as superposed in one ra-space. The 
points P x ... P p will lie in a space of p — 1 dimensions, a hyperplane in the 7i-space. 
Now we may rotate the axis without altering the functions | j or j by I which are easily 
seen to be invariant under orthogonal variate-transformations. If we perform such a 
rotation so as to bring the (p — l)-space of sample-points into correspondence with p — l 
co-ordinate dimensions, we see from (28.20) that | m {J | is the square of the content of a 
hyperparallelopiped with one corner at the origin and sides parallel to OP x . . . OP p . 

Now consider a hyperplane perpendicular to the unit vector meeting it, say, in O', 
and let P[ ... P p be the projections of the points P on to this hyperplane. Then by 
is the covariance of the co-ordinates P\ and Pj referred to O', and hence | by | is the square 
of the content of the hyperparallelopiped in the hyperplane. Furthermore, the content 
of this figure bears to that given by | niy | a ratio equal to the cosine of the angle between 
the unit vector and the hyperplane. Representing this angle by 6, we have 

-L_- 4 . = 008*0.(28.37) 


t 28.14. Now if the sample-points P are distributed in the w-space with random 
entation, the hyperplane which they determine will be distributed randomly in regard 
to the angle which it makes with a fixed vector, and in particular with the unit vector. 
The sampling distribution of 0 is then that of an angle between a fixed vector and a random 
plane. But this, from a slightly different viewpoint, is precisely the problem of distribution 
which we solved in connection with the multiple correlation coefficient R, for we saw (15.18, 
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vol. I, p. 381) that R is the jsine of the angle between a residual vector represented T5y^. 
variate # 1 , 2 ...^ an d the space containing other variates x 2 . . . x p ; and in the case when''* 
the former is independent of the latter we can regard it as fixed. Thus, from (28.37) we 
may write— 

1 _ = l - R* .(28.38) 


1 + 


fa 


71 — 1 

The distribution of R 2 in the case when the variate concerned is independent of the 
others is 


dF = 


b(™-) 


(1 - R2)i(n-P-2) (R2)i(p- 3 ) dR 2 f 


(28.39) 


where we must remember that p is the total number of variates and the variates ate measured 
from their means in forming the regression equation. Before substituting (28.38) in this 
expression we must increase p by unity, since in effect we are considering p + 1 variates 
—the unit vector determining an additional one ; and we must also increase n by unity 
because our variation is not restricted to that about the mean, as for multiple correlation. 
With these alterations in (28.39), we have, on substituting for R from (28.38) and a little 
reduction, 

* 0 tM) (* + J-l) U ) 

This is the distribution of Hotelling’s generalisation of 4 4 Student’s ” ratio. 


(28.40) 


28.15. At the end of the chapter we shall see that this is a particular case of a more 
general distribution (28.31). A third and instructive derivation, due to Wilks, is ae 
follows :— 

Prom the manner of derivation of Wishart’s distribution it will be clear that if we 
substitute the moments about the origin a ; j for those about the mean a tj , the distribution 
is the same, except that there is an extra degree of freedom. The distribution is then 

f vP | A 


dF = 


it 


exp 


n iP(P- 




nda'. 


n 


Putting B 11 = - AV, we find, on integration, 
2 


TjiPlp-l 


-i) 11 r( n ^ 

| | o' exp (— B 11 a]j) 77 da’ — | £ | ln - 2 ■ 


(28.41) 


Now replace n by n + 2r in this expression and divide by the term on the right in (28.4U. 
The result is to give us the rth moment of | o' | as 

r /n + l -k \ « 

i“r ( K I ) = 1-5-jT n ' 

I B I jfe*l p j 


+ r 






(28.42) 


A.S.—VOL. II. 
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We may also write the distribution of a' tl in the form given by our original derivation of 
Wishart’s distribution:— 

dF = exp (- m a tj ) II da x exp (- J5« x t x t ) II dx. 

nr( n -^\ n 

Multiply this by | a 9 | r , integrate, and use (28.42), transferring constant terms to the right 
as in (28.41); then replace n by n + 2s and divide by the constant terms as they were 
before substitution. We find— 

I « I > — rg-|Si n y jf- -r (28-43) 

l-Br *=i r n + 1 - * . o \ r n - k \ 


n^) 


Now put r = — 8 and note that 


o' j } m |* 


We find 






$) 

*(t* f) 


Now the function on the right is the sth moment of 

1 -m 1 1/a. -n 41 11 


dF = 


f n-p p\ 

A 2 ' 2 ) 


j.Kn-p-2) (! _ x )Up-2) dx 


(28.44) 


(28.45) 


which is uniquely determined by its moments. This, then, is the distribution of the ratio 

j-A-j , and on substitution in terms of T from (28.36) brings us back to the distribution of 

(28.40). Incidentally this method gives us one more derivation of the distribution of 
multiple correlations and correlation ratios when the respective variates are independent. 


Significance of a Set of Means 

• .4 

28.16. Suppose that we have a set of k samples with numbers n x . . . n k , each 
from a p-variate population. Let us also suppose that the populations have the same 
dispersion matrix but different means, that of the jth variate in the 2th sample being p } (0 . 
yfie proceed to derive a criterion for testing the means simultaneously. Our result is a 
generalisation of the testing of k means in normal samples, and we shall obtain it by applying 
the same method, namely by using the likelihood criterion 

X — Po («> max.) 

— Pi (Si max.) 

as given in equation (26.64). Here oj is the domain for which all the means of the jth 
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variate have a common value /q and Q that for which they have the more general values 


ft®- 

Let b i} be the function b i} for the 1th sample (1 = 1,2,.. 
of the ith variate in that sample. Put 

. fc) and £ { w the mean 

k 

= E ® 

. (28.40) 

where, of course, 


n t 

bij (l) = ( X it (/) x i (/)) ( X jt (D ~ X j (/))• • 

f-1 

. (28.47) 

Put, for the functions of the pooled samples, 


1 „ 1 „ . 

x i — ” A x it (/) — - A n g x t {l) . 

•V 1, l "l 

. (28.48) 

bij — 2 % ( x u (/) — x i) ( x jt (i) ~ */)• • 

If then 

m ij (1) ~ 2 ( x it (/) “ Mi (/)) ( X jt (/) “ Mj (/)) 

. (28.49) 

. (28.50) 

the likelihood of all samples together is 

c \A\* n exp {- \ E (n t A ij m l7 (Z) ) }, . 

. (28.51) 


where c is a constant. 

Taking logarithms and ’ differentiating, we have for the maximum value equations 
typified by 

E En t A 1J { (x it (Z) — /q (/) ) + ( x Jt {l) — /q {l) ) } = 0, 

i t 

which reduce to 

x i (/) — Mi (i)‘ • • • • • (28.52) 

The maximum likelihood values of the m’s are then given by 


™ij (0 = b ij (l)‘ 

Furthermore, the values of A u are then given by the inverse of the matrix ^ ~ 6^, and the 
exponent of (28.51) becomes 

— \n E (A ij by (/) ) = — \nk. .... (28.53) 



. (28.55) 
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Hence 


and we may write 


and take L as our criterion. 


A = 


n 


Hi 


n 


Hi 


{n 


in 


L = An = 



n 

_ 1 K 1 


—by 

n 

1*0 1 


. (28.56) 


28.17. The distribution of L for general k is not easily expressible, but we may 
determine its moments by the method employed in 28.15. The functions - by are dis¬ 
tributed in Wishart’s form and their moments accordingly given by equations of the type 
(28.42) with n replaced by n — 1, namely, 


/*r ( I by | ) 


nP r 


V 

n 


m-1 



. (28.57) 


Now each by {l) is distributed in Wishart’s form, and therefore their sum is so distributed 
(cf. Exercise 28.3). In the manner of 28.15 —we omit the details—it is found that 



v 

n 


1 


r 

r 


(_-iz K 

+ r)r {" 


m + 1 — k 


+ r 


— m + 1 — k' 


. (28.58) 


where we now use m as an index of summation, reserving k for the number of samples. 
This gives us the moments of L. 

In the case £ — 2 we have 



and hence the distribution of L is in the form 

dF = ■ - 1 ■ - - . (1 dL. 

In the ease k = 3 we find 



(28.59) 


. (28.60) 
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which, in virtue of the relation 

r(x + i) r(x + i) = VA r J*L+JJ 

becomes 

• _ F (»_- 2) T (n - p- 2 + 2 r) 
jr (n - 2 + 2r) f (n - p - 2) ' 

These are the moments of the distribution 




22? (w 


1 

P 


2 ~ v) (1 - VL)*- 1 AL , 


a rather unusual form. The results are due to Wilks. 


. (28.01) 


. (28.62) 


28 . 18 . The line of generalisation of univariate analysis will now probably be clear. 
Corresponding to most of our results for a single variate there will be a generalised result 
for p variates ; and, in fact, if we like to regard the ^-variate as a vector we can often draw 
direct analogies between results for vectors and those for the (univariate) scalar. It is 
of special interest to observe that the role played by the variance in univariate theory is 
taken over by the determinant of the dispersion matrix in multivariate theory. 

Up to this point we have generalised the distribution of variance (the ^-distribution) 
into Wishart’s form, and the ^-distribution into Hotelling’s form. 

Other results which suggest themselves for generalisation are regression and variance 
analysis. But in a sense our treatment of regressions is already general, for we have dis¬ 
cussed the regression of one variate on p — 1 others. Below we shall go further and 
examine the relations between p dependent and q independent variates. In vector lan¬ 
guage, we consider the regression of a p -way vector y on a q -way vector x. We have also 
considered the analysis of variance for the bivariate and trivariate case in Chapter 24 
under the title of analysis of covariance, and since the interest lies mainly in the direction 
of regressions we shall not take the subject further here, though it is capable of develop¬ 
ment and even, perhaps, of application if data become available in sufficient abundance. 
In the remainder of the chapter we shall, in the first instance, deal with an offshoot of 
regression theory which has some interesting taxonomic applications, namely discrimina¬ 
tory analysis ; and we shall then proceed to the general problem of the relationship between 
two sets of variates. 


Discriminatory Analysis 

28 . 19 . Suppose we have p observations for each of 2 n sample members, and that 
each member can have emanated from one of two populations, n to each population. We 
require to find some measurement depending on the p observations which will enable us 
to assign subsequently drawn members correctly to their parent populations with the 
greatest assurance of success. For this purpose wc shall find p quantities A 1 ... A p and 
a discriminant function X related linearly to the variates by 

X=X J Xj .(28.03) 

The criterion on which we shall rely is that the A’s must be chosen to maximise the ratio 
of the difference between sample means to the standard deviation within the two classes. 

Any linear function of type (28.63) has variance 8, given by 

S-XVay , 


. (28.04) 
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where, as usual, a tj is the covariance of x t and x i which we assume to be the same for both 
populations. Further, if the difference of the two means of x j is dp the difference of the 
function X for the two samples is 

D =73 d t .(28.05) 

We have then to maximise for variation in A the function 


D 2 _ {73 d^ 2 
8 7373ay. 

This gives for each A 

ISS^SdD 
2 3A D 3A’ 

leading to equations typified by 

W ay = — d i% 


. (28.66) 


. (28.67) 


Multiplying by a ik and summing over i, we have 

73 a {j a ik = ~d i a ik 


or, replacing k by j, 


= 73 6) = 7i k ; 
V = £ d { a i} . 


(28.68) 


This determines the A’s, except for the constant which can be chosen at will so far as the 

discriminant function is concerned. If c is some constant, we have 

73 =cd { afK .(28.69) 

The result also holds if there are n l members in the first sample and n 2 in the second. 
Equation (28.65) remains true, and the rest of the analysis is the same as for equal class- 
numbers. 


Example 28.2 (from R. A. Fisher, 1936a). 

Measurements were made on fifty specimens of flowers from each of two species of 
iris, setosa and versicolor, found growing in the same colony. Four measurements were 
taken, viz. sepal length, sepal width, petal length, and petal width. We denote them by 
x u x % , a? 3 and x A respectively. 

The means of the specimens were (in centimetres):— 


Variate. 


Versicolor. 


x 2 


5*936 

2*770 

4*260 

1*326 



Setosa. 


5*006 

3428 

1*462 

0*246 


Difference 
(F—£). 


0*930 
- 0*658 
2*798 
1*080 
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The sums of squares and products about the 'means were (in cm. 2 ):— 



*1 


*3 

*4 


191434 

90350 

9-7634 

3 2394 

* a 

90356 

11-8658 

4-6232 

2-4746 

*3 

9-7634 

4-6232 

12-2978 

3-8794 

*4 

3-2394 

2-4746 

3-8794 

2-4604 


The inverse matrix is, in cm. -2 :— 



" ' . 


‘ ( ... .- 


*1 

* 2 


x t 

0 - 118,7161 

- 0 - 066,8666 

- 0 - 081,6168 0 - 039,6360 

x 2 

- 0 - 066,8666 

0 - 145,2736 

0 - 033,4101 - 0 - 110,7529 

*3 

- 0 - 081,6158 

0 - 033,4101 

0 - 219,3614 - 0 - 272,0206 

*4 ! 

0 - 039,6350 

- 0 - 110,7529 

I 

! - 0 - 272,0206 0 - 894,5506 

| 

._^ 

1 




We need not bother to divide these quantities by n because there is an arbitrary con¬ 
stant in our discriminant function which absorbs it. The matrices are diagonally sym¬ 
metric, and it is not always necessary to write out the values below the diagonal as we 
have done here. 

From (28.69), with c = 1, we then find— 

A 1 = - 0 031,1611 A 2 = - 0 183,9075 

A 3 - 0-222,1044 A 4 - 0-314,7370. 

If we choose the coefficient of x x to be unity the discriminant function is then 

X = x x + 5-9037a; a - 7-1299*s - 10-1036* 4 . . . . (28.70) 

The mean of X for versicolor , obtained byvsubstituting the means of the *’s for that species, 
is found to be — 21-4815, and that for setosa is 12-3345. The difference is thus 33-816 cm. 
Let us compare this with its standard error to see whether it is significant of real differences 
in the values of X for the two species. 

From the matrix of sums of squares and products we find 

N var X = X* V a tj = 1085-5522, 

where the A’s are, of course, the coefficients in (28.70). N here is the number of degrees 
of freedom of the estimate of the variance. There are 100 members altogether, with 99 
degrees of freedom, but we have eliminated four corresponding to the means of the four 
variates. We therefore take JV to be 99 — 4 — 95, and find 

var X = 11-4269. 

This is the variance of a single value. That of the difference of the two means of 50 values 
is obtained by division by 25 and is thus 0-4571, the corresponding standard error being 
0-676. 

The observed difference of means, viz. 33-816, is about 50 times this amount, and 
there is thus a real difference in the values of X for the two species. In other words the 
'discriminant function is a good one. It is best among the linear functions of the x*& because 
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we have chosen it so that the difference of two values, divided by their estimated standard 
error, shall be the greatest possible. To use the function we should, given a flower of 
doubtful species, calculate X for it and assign it to one species or the other according as 
X were nearer to the mean value of X for one species or the other. If, of course, 
the observed value differed from the mean values by more than twice the standard error 
of each, we should begin to doubt whether it belonged to either. 

The analysis may be put in rather a different way. Suppose we analyse the variation 
of X between and within species. The sum of squares between species in the 50 x 2 
classification is 

50 { (*, - *)• + (*. - *)• }, 

where X lt X , are the respective means and X the mean of the whole. This reduces to 25 D a . 
The sum of squares within classes is 1085-55 with 95 d.f., as found above, and we have— 

d.f. 

28,58805 4 

1 , 085-55 95 

29 , 673-60 99 


Our method of selecting the discriminant function has been such as to minimise the sum 
of squares within species and, for constant total, to maximise the sum between species, 
and hence to minimise the ratio of the latter to the former. For the moment we cannot 
assume that this ratio may be tested in the 2 -distribution in the usual way, though we shall 
see presently that this is so. 


Sum of Squares. 


Between species 
Within species 


Totals 


28.20. The relationship of discriminatory analysis for two classes and the theory of 
regression may be brought out by introducing a formal variate y for the classes. If there 
are n l members in one class and n 2 in the other we shall assign the values 


n l + n 2 


i\ + n 2 


to the y-variate for the two classes respectively. The mean of y for the whole sample is 
then zero and the sum of squares is 


n x + n 2 


— f, say. 


Considering now 


Y = as, 


as a regression equation, we find for the coefficients X 

Z ( Yxj) — X* Z foXf) = 0, 
or Z(Yxj) — X { a {j = 0. 


. (28.71) 


. (28.72) 


. (28.73) 


x<r, *»-5rT5;*w 


»i + ». 


(a*). 
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where the suffixes of the JC’s relate to the* first and second classes, 

- id,. 

Thus = X* a i} , . (28.74) 

which is another way of writing (28.69) with a particular value for the constant c. 


28.21. Pursuing the analogy with regression analysis further, we see that since 

£(P 2 ) -f 

and Z(Yx j ) = Zd j 

we may analyse the sums of squares as— 

Sums of squares. d.f. 

a*d { P 

t (1 - # d { ) n x + n 2 — p — 1 


C n x + n 2 - 1 ... (28.75) 

as for a regression line. If R is the multiple regression of Y on the x-variates, 

R 2 = A 1 ' d i . (28.76) 

In ordinary regression analysis we may test the ratio R 2 /( 1 — R 2 ), multiplied by 
suitable constants, in the 2 -distribution ; but this depends on the assumption that the 
dependent variate y is normal for any fixed x’s. Here we have the case when the dependent 
variate is fixed but the a’s are normal. The test still holds in such a case, the reason being 
the kind of duality we noted in 28.14 in arriving at Hotelling’s distribution. The distri¬ 
bution of angles between a fixed plane and a random vector is the same as that between 
a fixed vector and a random plane. Consequently the table of (28.75) can be regarded 
as an analysis of variance and the 2 -test applied. 

28.22. We may extend the discriminant function to the case when the property to 
be discriminated is not, as above, a matter of allocation to one of two classes, but to several 
which may in particular be determined by certain values of a continuous variate. If we 
have various measurements of p rr-variates corresponding to values of a y-variate, we may 
form the regression of y on the x’s and use the resulting function as a discriminator. As 
in the case of dichotomy, the regression will maximise the difference between classes as 
compared with intra-class variation; and its significance may be tested in much the 
same way. 

Example 28.3 (from M. M. Barnard, 1935). 

An investigation was undertaken into the changes taking place over time of the char¬ 
acteristics of certain Egyptian skulls. There were four sets of skulls, known to be from 
Late Predynastic, Sixth to Twelfth, Twelfth to Thirteenth and Ptolemaic dynasties respect- 
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ively, and the relative time-intervals were taken to be in the proportions 2 :1 : 2, so that 
the values.of t for the four periods may be taken to be respectively — 6, — 1, + 1, + 6. 
For the skulls four measurements were selected: 

x it basi-alveolar length ; 
x t , nasal height; 
x„ maximum breadth; 
x t , basi-bregmatic height. 

It is required to find a function 

X = A 1 x x + A* x a + A 8 x» + A 4 x* 

which will best discriminate between skulls belonging to different periods. 

The means of the series were as follows, the sample numbers also being shown:— 


Variate. 

Series I 

Series II 

Series III 

Series IV 

(n, = 91). 

(n a = 162). 

(n, - 70). 

(n t - 75). 

x t 

133-582,418 

134-265,432 

134-371,429 

135-306,667 


98-307,692 

96-462,963 

95-857,143 

95-040,000 

*3 

I 60-836,165 

51-148,148 

50-100,000 

52-093,333 

131-466,667 

i 

*4 

! 133-000,000 

1 

134-882,716 

133-642,857 


The sums of squares and products about the means are— 



*1 

*2 

*3 

- 

x i 

9661-997,470 

445-573,301 

1130-623,900 

2148-584,219 


... i 

9073-115,027 

1239-221,990 

2265*812,722 


... 

... 

3938-320,361 

1271-054,662 

*4 


... 

... 

8741-508,829 


The mean value of t, l, for the 398 observations is — 0-432,161, and the values of t —i 
for the four series are accordingly 

- 4-567,839 ; - 0-567,839 ; 1-432,101 ; 5-432,161. 

The sums X Xy (t — t) are respectively 


x x 718-762,86 

- 1407-260,75 
x, 410-101,94 

x« - 733-668,32 


and finally, X(t — <)* = 4307-668,32. 

We could obtain the coefficients A from the reciprocal of the matrix above on the lines 
of the previous example. It is also instructive to observe, from the analogy with regres¬ 
sions, that instead of that matrix we may use the matrix (depending on one extra degree 
of freedom, 395 in all) obtained by adding to the sums of squares the regressions on time. 
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For instance, instead of 9061-997,470 we have 9661-997,470 + (718-762,86)V 4307#e6 ^»32. 
The resulting matrix is 


*3 

•**4 


*1 

*2 

** 

*4 

9781-927,828 

210-762,489 

1199-052,135 

2026-206,952 

... 

9532-849,476 

I 1105-246,827 

2405-414,318 

... 

... 

3977-363,203 

1201-230,304 



• • • 

| 8866-382,928 

i 


The reciprocal of this is (units = 10 6 )— 





* r 2 

x z 

* r 4 


110-368,975 


6-938,481 

115-693,529 


- 28-145,236 ! 

- 24-948,984 j 
273-988,409 | 


~ 23-361,935 

- 30-767,069 

- 23-666,591 
129-990,069 


The resulting values of A are 

A 1 = 0-075,156,739, A 2 = - 0 145,490,050, 

A 3 - 0-144,600,884, A 4 = - 0-078,538,419 

and these, or constant multiples of them, give us the constants in the discriminant function 
which will best enable us to assign a skull to the correct period by measurements of the 
four specified variates. 

In this analysis we have 398 members, but of the 397 d.f. we have discarded two with 
the general mean. The d.f. of the sum 4307-6683 -= £ (t — i) 2 are 395, of which four are 
attributable to regressions on the other variates. For the contribution of these four we 
have 

A 1 X 718-762,86 j- etc. = 375-6657. 

The analysis of variance is thus— 


Sum of Squares. 


d.f. 


! 

Quotient. j 


| Regression 
! Remainder 


Totals . 


375-6657 

4 


3932-0026 

391 

10 0563 

4307-6683 ; 

i 

395 



The analogy of the discriminant function with regressions noted above may be used 
to provide standard errors of the coefficients A. In our present case the variance of A 1 
is obtained by multiplying the remainder quotient, viz. 10-0563, by the term corresponding 
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to x\ in the reciprocal matrix of sums of squares of the ar’s, namely 110*368,975 x 10“ 6 . 
This gives a standard error of 0*0333. We obtain finally 

A 1 = 0*0752 ± 0*0333 

A 2 -= - 0*1455 ± 0*0341 

A 3 = 0*1446 ± 0*0525 

A 4 = - 0*0785 ± 0*0362. 

All coefficients exceed twice their standard error, and hence all the variates are useful in 
discriminating between skulls of different periods. 

I am indebted to Dr. M. S. Bartlett for the calculations of this example. His results 
differ from those reached by Miss Barnard in her original investigation since she took an 
unweighted regression of the variates with time, whereas he has weighted the values 
according to sample numbers. He also notes that the significance of the results has been 
tested above on the basis of variability within classes, but that a fuller analysis of the means, 
bringing back the two degrees of freedom discarded, reveals further differences between the 
series. Thus, though the discriminant function will efficiently sort the series examined in 
relation to their periods, we must be cautious about associating the observed differences 
with the time-changes. 


Canonical Correlations 


28.23. We now turn to consider the general theory of the relations between two 
sets of variates x x ... x p and x p+l . . . x p + q , where we suppose that p <q. Following 
Hotelling (19366), we shall show that in general there can be found linear transformations 
to variates £i . . . £ p , £ p+1 . . . £ p + q such that 

(а) all the £’s have unit variance and zero mean; 

(б) any £ in the p-group is independent of the other £’s in that group ; 

(c) any £ in the g-group is independent of the other £’s in that group ; 

(d) the correlation between any £ in the p-group and any £ in the g-group is zero except 

for p correlations p x ... p pi which may be taken to be the correlations between 
fi and £ p+ i, £a and £p + 2 > • • • £p and £ 2p . 

The variates £ are then said to be canonical variates and the p’s canonical correlations . 

This part of our work is, fundamentally, the reduction of two quadratic forms and an 
associated bilinear form to canonical types and does not depend on the distribution laws 
of the variates. Furthermore, the reduction can be carried out either on the population 
or on the sample. In the latter case it will yield sample canonical correlations which may 
be written r x ... r p and regarded as sample-values of the parent p’s. 

We will suppose that our variates x have zero means and dispersions denoted by o ijy 
where, for the time being, we use a to denote a variance or covariance instead of the more 
usual a z . Those dispersions in the p-group we denote by Greek affixes: o^, and those 
in the g-group by Roman affixes : a {j . For a covariance of a ^-variate with a g-variate 
we write one Greek and one Roman affix : a ai . 

Consider now a particular pair of variates given by 


£ = C* 

t] =d a x a , a = 1, 

If their variances are unity we have 

c“ <?<*<# = 

Wo* = If* 



. (28.77) 

. (28.78) 
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We will also impose the condition that their correlation R is stationary for variations in 
the coefficients c and d, i.e. that 


R = c*d a <r„ 


stationary. 


(28.79) 


Equations (28.78) and (28.79) then require an unconditioned stationary value of 

d a o ra - £A c*cP - & dodoes . . . (28.80) 

where A and fi are undetermined multipliers. This leads to 


d n or. 


— fid b = Ol 
, —— 0j 


(28.81) 


Multiplying the first equation by d a and summing and the second by c* and summing, 
we have, in virtue of (28.78) and (28.79), 


R = A - n .(28.82) 

Equations (28.81) will then be soluble for the p -f q unknowns c and d if the determinant 
of their array vanishes, that is if, writing A for the constants p and A, 


— Aor ll 

. . . 

— Acr, p 

a \,p+\ 

G l ,p+q 

• £ 

i 

. 

~ fopj, 

°p,p+ l 

• • ■ G P,P+q 

g p+\, l 

. 

a p+i,p 

— 1 fi H l 


G p+q, 1 

. 

G P+q, v 

^ G p + q,p+ l 

• • • ^ G p+q,p+q 


. (28.83) 

an equation determining A. Before studying it further we will throw the equation into 
a somewhat different form. 


28.24. We may write (28.83) as 


- AOgtf 


a ifi 


- Xa i} 



. (28.84) 


Multiplying the first p rows by — A and dividing the last q columns by — A we find the 
equivalent form 


(- X) q ~ p 


• : 





. (28.85) 


Writing, in conformity with our usual notation, (or 1 *) for the matrix inverse to (cfy) and 
remembering that 

a ik = 4 , 


let us multiply (28.85) on the left by 
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The product of determinants is then 


A 2 a Py — a ip a ik o yk 

% - a* Oyk Ojj 

o li ; 

o 1t o iJ 


** Ofiy - O ift O* Oyk : 

0 

O 1 * Oift 



which gives 


(— A) B-,> | A 2 a. 


'Py 


<*V> atk <>yk 


= 0 , 


a determinant with p rows and columns multiplied by a power of A. 


. (28.87) 


28.25. Returning now to our original problem, we see that if a simple root of (28.83) 
is substituted in (28.81) the c’s and d ’s are determinate, except of course that they may be 
replaced by — c and — d. For a root of multiplicity m they are determinate except for 
m — 1 assignable constants, a result we take without proof from the theory of algebraic 
forms (reference may be made to Hotelling’s paper for details). 

From (28.87) we see that the equation in A has p + q roots. It cannot have fewer, 
for the coefficient of the highest power of A in (28.83) is the product of two principal minors 
which do not vanish unless the variates are linearly dependent, a case which we exclude 
from the discussion. Of these p + q roots q — p are zero. The remaining 2 p can be 
grouped in pairs, each of which is the negative of the other. There are thus roots which • 
we may write ± pi, . . . ± p p . We choose as the roots those which are not negative and 
proceed to prove that they are the canonical correlations as we have defined them. That 
they are, in fact, correlations follows from (28.82). 

Suppose we have a root p y and determine the corresponding constants c y and d y and 
hence a pair of variates and r\ r Then we have, from (28.81), 


C y a Py 
&<xa == Py C y 


(28.88) 


Similar equations obtain for a second pair, say and rj d . Between these four variates 
there are six correlations, two of which are p y and p A . We wish to show that the other 
four vanish. They are 


E (f y f 4 ) = c‘ eg <7^ E (r/ Y %) = d° dg 

• E (f y r) t ) = c* dg o alt E (f a rjy) — eg dy o at . 

Multiply the first of (28.88) by dg and sum. Using (28.89), we have 

E (f v %) = py E (rjy rj t ). . 

Similarly from the second of (28.88) multiplied by eg, 

E (fa rjy) = p v E (f y f a ). 

Interchanging y and d we find from (28.90) and (28.91) 

Py E (t) v rj e ) = p t E (f y f a ). 

Equally, again interchanging y and d in (28.92) we have 

Pd E (rjy t] d ) — PyE (f y f a ). 


. (28.89) 
. (28.90) 
. (28.91) 
. (28.92) 
. (28.93) 
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Thus, unless p y — p], 

E(£ y £») = £(i? y ifo) = 0.(28.94) 

It follows from (28.90) and (28.91) that the other correlations also vanish. 

We have only to round off the proof by showing that if p is a root of multiplicity m 
the property still holds. This follows from the consideration that we may then choose 
our c *s and d' s to obey certain orthogonal conditions ensuring that 

E (£ y £ a ) + E (rj y r l6 ) = 0. ... . (28.95) 

It will then follow from (28.92) that each expectation vanishes unless p y = p A = 0 ; and 
even in this case, (28.91) and (28.92) show that two expectations vanish, and we may then 
choose our assignable constants so that the others vanish. 


28.26. When the variates are put into canonical form the dispersion matrix reduces to 


1 

0 . 

. 0 

Pi 

0 . 

. 0 . 

. 0 

0 

1 

. 0 

0 

p 2 . 

. 0 . 

. 0 

0 

0 . 

. 1 

0 

0 . 

• Pp • 

. 0 

pi 

0 . 

. 0 

1 

0 . 

. 0 . 

. 0 

0 

P2 . 

. 0 

0 

1 . 

. o . 

. 0 

0 

0 . 

• Pp 

0 

0 . 

. r . 

. 0 

0 

0 . 

. 0 

0 

0 . 

. 0 . 

. 1 


\ 

/ 


(28.96) 


with a determinant equal to 


(i -p {)(i - pi) . . . (i -pi). 


Example 28.4 (from Hotelling, 19366, dealing with data of T. L. Kelley). 

140 seventh-grade school children were given four tests in (a) reading speed, (6) reading 
power, (c) arithmetic speed, and (d) arithmetic power. It is required to find canonical 
variates for the two reading tests and the two arithmetic tests. 

The correlations between the variates were— 



•**1 

i 

. r 3 

*4 


1 0000 

, i 

0-6328 1 

0-2412 

0-0586 


0-6328 

1-0000 ! 

- 0-0553 

0-0656 

• r 3 

0-2412 

i - 0-0553 | 

1-0000 

! 0-4248 

*4 

0-0586 

j 0-0655 i 

0-4248 

j 1-0000 


The determinant (28.83) becomes 

— A - 0-6328A 

- 0-6328A - A 

0-2412 - 0-0663 

0-0686 0-0655 


0-2412 0-0586 

- 0-0553 0-0655 j _ 

-A - 0-4248A j “ 

- 0-4248A - A i 
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or 

giving 

with 


0-491,370 A 4 - 0-078,803,4 A* + 0-000,362,490 = 0, 
A* = 0-155,635 or 0-004,740 
A = 0-3945 or 0 0688. 


To find the transformed variates themselves we use (28.81). For instance, with the 
root 0-3945 for fi, we have 

c 1 + 0-6328 c 2 - 0-6114 d l - 0-1485 d 2 = 0 

0-6328 c 1 + c 2 + 0-1402 d 1 - 0-1660 d* = 0 

- 0-6114 c 1 + 0-1402 c 2 + d l + 0-4248 d 2 = 0 

- 0-1485 c 1 - 0-1660 c 2 -f- 0-4248 d 1 + d 2 = 0 

The last equation is linearly dependent on the other three, so adds nothing. In the other 
three we solve for the ratios of c’s and d’s, finding 

c 1 : c* : d 1 : d* — - 2-7772 : 2-2665 : - 2-4404 : 1. 

Thus the transformed variates are 

k a f 1 = - 2-7772 x k + 2-2655 x t 
k a r) 1 = — 2-4404 x t -f- x x , 

where k x and k a may be chosen so that the variances of f 1 and // 1 are unity, if desired. Similar 
equations with the root 0-0688 will give us a further pair of canonical co-ordinates. Those 
we have worked out have the maximum correlation, the other pair having the minimum 
and therefore being of less interest. 


28.27. In practical cases it is of some importance to know whether an observed 
eanonical correlation r u say, is significant of real correlation. The problem has been solved 
for large samples but not completely for small samples. We shall conclude this chapter 
with a short account of the main results which have been reached. 

.For large samples we shall show that, for the standard error of a canonical correlation, 

varr = -(l - r 2 ) 2 .(28.97) 

n 

a remarkable result showing that the variance is the same as for a product-moment 
coefficient. 

Denoting as usual the sample covarianoe by a fj we have to the first order 

( a ij) — .(28.98) 

To the same order, 

I (Oy a M ) = ~ e\e (x la x ]a ) E (x w x v ) }. 

75 v. a ft i 

If a ^ (i the sums on the right are independent, and there are n (» — 1) such cases. When 
« = p we have n terms such as 

E {Xfa Xfa Xfa Xfo) = <Ty a u + a a o jk -f a ik a it , . . . (28.99) 

as follows from the consideration that the characteristic function of the multivariate normal 
form is. 


(of. 15.12, vol. I, p. 376). 


exp (—4<ry t 1 If) 
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Hence we have 

«i / v n (n — 1) n . 

" °«) — a ld + ^2 °« + °U a Jk + CT (fc °>l) 


— a ij a IU + ~( a it a ik + ,a ik a jl)- 
TV 


Thus 


E (da i} da u ) = E (a l} a w ) - a {j a kl 
= n ° ik ° ik a ^' 


Now for any canonical correlation r we have 

c“ a,* = 1, 


= 1» rf 1 ' d f a {j = 1} 
r = c* d* a a£ J ’ 


(28.100) 


(28.101) 


(28.102) 


If now we define for the sampling deviations in c’s and d's corresponding to deviations 
in the o’s, 


Ac«=Z^Aa lu , 

t, u oo <u 


. (28.103) 


(28.104) 


*}■ 


we find 

2 a^ c* + c® zla^ = 0 

2 + d a d b Aa^ = 0 

Ar x = a a6 c* d 6 zfc® + c* # 

Without loss of generality we may now suppose the variates canonical and hence put 
c 1 = 1, c 2 = c 8 = . . . = c*> = 0, d 1 = 1, d 2 = . . . = d« = 0. We then find 

2AC 1 + Aa lx = 0, 2Ad 1 -f- Aa p+ i tP +i 
Ar x = r x Ad 1 + r x Ac 1 + Aa lt p+1 

Substituting from the first two in the third of these equations we have 

Ar x = Aa lt P+1 - \r x ( Aa lt + Aa p+h p+1 ). 

Similar equations apply for any other simple root, e.g. 

Ar 2 = ^®2,p+2 i r 2 (Aa 2 2 + Aa pJr . 2 ( 3,4.2)- 
Squaring these equations and substituting from (28.101) we find 

nE (Ar x )* = (1 - r?) 2 
E (Ar u Ar 2 ) = 0. 

It follows that 


(28.105) 

(28.106) 


varr x = - (1 

1 

cov (fi, r,) =» 0 J 


(28.107) 


to our order of approximation. 


28.28. Equation (28.107) applies to a simple non-vanishing correlation. If a canon¬ 
ical correlation vanishes and p = q, the result holds, with the^ qualification that sample 
values of r near the zero root must be allowed to have positive or negative values, or alter¬ 
natively that the distribution of r is that of absolute values of a normal variate (cf. Exercise 
28.7). If p = 2, q > 2 a zero root is of multiplicity q at least. In this case, if it has exactly 

A.S.—VOL. n. A A 
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multiplicity q > nr 2 is distributed as x 2 with q — 1 degrees of freedom. For the proof of 
this result see Hotelling (19366). 

There is another rather curious difficulty in testing the significance of roots of the 
equation giving the canonical correlations, namely, that if several roots exist it is not pos¬ 
sible to relate them with certainty to specified parent correlations—any one might have 
arisen from any one of the parent values. This is not serious for large samples when the 
roots are distinct, since the sample values cluster closely round the parent values; but 
for small samples or canonical correlations in the parent which are close together it presents 
a theoretical problem of a novel kind. See Hotelling (19366) and Bartlett (1941) on 
this point. 


28.29. We proceed to find the sampling distribution of canonical correlations in the 
case when the parent values are all zero and the p-variates and 9 -variates accordingly 
independent. 

Reverting to equation (28.87) in the form appropriate to samples, we have 


We write 
and 

so that (28.108) becomes 


I a Y k I = 0. 

fa = *i<> aPayk 

a Py ~ z fiY + W 

I ( Z fiy + iffy) — tfly I = 0 . 


. (28.108) 

. (28.109) 
. (28.110) 

. (28.111) 


The significance of this device is that 2 and t are distributed independently in Wishart’s 
form, as we now proceed to show. 

One instructive way of looking at the problem is to consider the regression of the 
p-way vector y on a 9 -way vector x. Corresponding to the univariate equation 


y = bx -f e,.(28.112) 

where e is a residual, we have 

V* = x i + (28.113) 

where the 6 ’s are given by minimising the sum of n values 

namely, by 

z (y« * t ) (** *<) = 0 

or, in our notation for canonical variates, 

®«i - «* - 0, 

which yields 

6 * =<**<**<.(28.114) 

We may analyse the variance of y in the form— 

z(yl) = i:(Kx i +x ia )* 

- K a ik + E (**)*.(28.115) 

corresponding to the univariate case 

Z(y*)=b‘Z(x*) +2T(e*), 

and the two constituents on the right in (28.115) are independent, just as in the univariate 
ease. This may be shown by a direct extension of 22.19. 
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Furthermore, if we wish to find the linear function of the y'a, say y,, which has 
maximum correlation with the aj’s, we have to maximise the ratio 


^ (A* *>«* <)» _ A« W K bj, Oy _ 
This is equivalent to maximising unconditionally 


(28.116) 


A“ (bl bj, Ojj — r 2 a^) = 0, 

giving, for r 2 , the equation— 

| 6* bj o y - r* | = 0.(28.117) 

Now in virtue of (28.114) this reduces to 


I r 2 - Oy o am a m< apj, | = 0 

or 

| r 2 — o ai a pi a Pp \ — 0, .... (28.118) 

which is equivalent to (28.108) with a slight change of notation. This must be so, for 
we arrived at both equations on essentially the same assumptions. Now we see that the 
term on the right in the determinant of (28.118) is the first item on the right of the variance 
analysis given by (28.115), and the other term in the determinant is the sum 27 (y 2 ) of the 
analysis. It follows that z and t of (28.111) are independent, for they are the constituent 
items of the analysis. Furthermore, the z’s will be distributed as sums of squares or pro¬ 
ducts about the means with n — q degrees of freedom, that is in Wishart’s form ; and 
similarly the V s are distributed as q sums of squares or products about the origin, i.e. in 
Wishart’s form with n = q + 1. 


28.30. Without loss of generality we may take the parent variances to be unity; 
the covariances are zero by hypothesis. The joint distribution of z and t is then, from 
(28.26), 


dF = 


| t |* ta-*- 1 * | z |* exp | ttu + %) j- IJdt dz 

2 » <»+»> a*fi | r(^ ) r ( -— \ ~ 1 j} 


(28.119) 


In the determinant 

| A 2 (z + t) - t | = 0 


put u — A* and let the roots in u be arranged in descending order of magnitude. Consider 
the distribution for a given value of ty and Zy which in particular we take to be 6 tj . Let us 
choose new variates from a set obeying the orthogonality conditions— 

1 

= 0 if * 

= 1 if t -mj .(28.120) 

Make the transformation t i} = E (£ ilc £ jk u k ) .(28.121) 

k 

ty + z u — % {Sat Sjk) — Sy. .... (28.122) 
k 
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Instead of the \p(p + 1) values of t {j we will take the p values of u and \p{p — 1) of the 
|’s as our new variates. We have % 

I * I = I £jk u k I a n u k • • • • • (28.123) 

Ac—1 

1*1 -a#%)l = n (1 -U k ) .. . . (28.124) 

fc -1 


and have only to consider the Jacobian. This is clearly of degree \p (p — 1) in u, for the 
Jacobian of t and z +1 is the same as that of t and z and only t contributes factors in u 
in the former. Furthermore, every term — Uj), i < j is a factor of J. For consider 
u t — u t and let us take as our f-variates those for which j > i. Then to satisfy the con¬ 
ditions on the others, derivable from (28.120), 


we must have 


whence 


af- E (£ih £jk) = 

__ _ £ii _ £g 

Six Six 

#*-o, j>2, 

<7*1* 

^TjT- — a J- % (€ik %Jk u k) 

Off It Offl* 

ffn 


(28.125) 


Thus every term («< — u } ) occurs in J, and there can be no further factors in u because 
the power in u is \p {p — 1). 

Substituting in (28.119) we have, integrating out the f-variates, 

dF = c h {u\ to-*-" (1 - utf }n(u ( - Uj) ndu 

i—1 

where 

The constant k arises from terms involving n and p in the original density and from the 
Jacobian. It therefore does not involve q and may be written k (n, p ). Evaluation of 
k by direct integration is a matter of some difficulty, but we may find it indireotly 
as follows:— 

In (28.126), if we increase q and n by 2s, the corresponding value of o is 


. (28.126) 

. (28.127) 


k(n + 2s, p) 



q + 1 + 2a — i 


M 



. (28.128) 


The only other term in (28.126) which is affected is that in IT (u) and, with the original 
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c of (28.127), the integral of the distribution so modified would give us the moment of 
order e of 17 («), namely of 1 1 1. This may be found in the manner of 28.15 to be 

■ ^g + 1 + 2s — i'j + 1 — i'j 

' n + 2s + 1 — : 


n 


(see Exercise 28.11). It follows that 


■j' 2 s — i 
2 


whence 


r( n 

k{n + 2a, p) _ n \ 

*(»>?>) "rj ^t — 


■1 


(28.129) 


k (n 


, P )=nr(^^±y(p). . . . 


(28.130) 


(28.131) 


(28.132) 


It remains to evaluate f (p). To do so we make the substitution in (28.126) 

2v { 

u i = -S, 
n 

letting n tend to infinity. Our distribution becomes 

dF = f exp (— £ v { ) n (v t - v } ) 77 dv. . 

\ 2 / 

This may be reduced by successive substitutions of the type 

«i = Wx, Vj =Wf + V lt j > 1, 

and choosing q at each stage so that the term in 77 (v) vanishes (as we may, since the result 
is independent of q). On integration for v„ then repeating the process, and so on, we find 

f(p ) _ n r(p + i —») = j 

4-2"—” 


nr 


Using the relation 
we have 




2 kv (p- i) 


F(x) r(x + i) = 2-**+ 1 V*r(2x), 


f(p) = 


rrlK 




Thus our distribution is finally 

dF = c II {v> (1 - u)* (»-»-*-*> } n (u t - Uj) II du , 

where 


= j*p h 


r(t + l ~ i ) r(»-=f=*y 


(28.133) 


(28.134) 


(28.135) 


a remarkable form obtained in the general case by Fisher (19395), P. L. Hsu (1939b), and 
Roy (19396). 
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We have supposed throughout that q > p, In the contrary case we reverse the roles 
of q and p and hence merely have to interchange p and q in (28.134) and (28.135). 

28.31. Let us consider some special cases. When q = 1 the distribution becomes 

r (^) 


dF = 




v\ (P-2) (! _ Ui )k (n-p- 3 ) i Ui> 


(28.136) 


confirming the distribution of equation (28.40) leading to Hotelling’s distribution; for 
the canonical correlation is then the multiple correlation between the g-variate and the 
p-variates; and as the former is measured from its mean there is one fewer degree of 
freedom, i.e. n is replaced by n — 1. 

When q = 2 we have 


dF= 


•''’(xHx) 


r(”^) r(|) r (ILfi) 

Writing 


««)* <p ~ 3> { (1 — «i) (1 — u,) }* <"-*-*> 

X («i —u t )du 1 du i . . (28.137) 


(1 - u x ) (1 - u t ) 

u x + 


«» 

W, 


we find 


dF = 


r(n - 2) 


w )i (p- 3) v i ( n-p-i ) dv dwm . (28.138) 


4T (» — p — 2) r (p — 1) 

For given v the limits of w are 1 — v and 2(1 — y/v), and integrating for w we find 
r (n - 2) 2 


dF = 

or, for y/v, 

dF = 


4T 1 (to — p — 2) r (p — 1) ‘ p — 1 
1 


(1 - y/v)»~ l ( y/v) n ~P~*dv 


B (n - p — 2, p) 
a result due to Wilks—of. equation (28.62). 


(1 - y/v)*- 1 (y/v) n -P~ 3 dy/v, 


. (28.139) 


28.32. The distribution of the u’a does not immediately provide a test of significance 
of the canonical correlations, except when there is only one of them. The criterion 

v = 77(1 -u) .(28.140) 

is sometimes useful in the general case for testing simultaneously the departure of the 
«’s from zero. Cf. Exercises 28.11 and 28.12. 


NOTES AND REFERENCES 

Among earlier papers in which various aspects of the multivariate problem began to 
be studied, reference may be made to Karl Pearson (19266) on the “ coefficient of racial 
likeness ” and Ragnar Frisoh (1929), who independently arrived at the dispersion matrix 
and proposed to call its determinant in standard measure the “ scatteranoe ”. Reference 
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to the papers by Wishart (1928), Wishart and Bartlett (1933c) and Hotelling (1931) on the 
generalised producfr-moment distribution and the generalised 44 Student ” ratio has been 
made in the text. 

In more recent literature three lines of development are discernible :— 

(а) American writers have developed the theory of canonical correlation and multiple 
analysis mainly on algebraic and analytical lines. See Hotelling (1933, 19366), Wilks 
(1932e, 1934, 19356, 1935c, 1936, 1943), Girshik (1939), and Madow (1938). 

(б) English schools have investigated the theory of discriminant functions and devel¬ 
oped the sampling theory of canonical roots. See R. A. Fisher (1936a, 6, 1938c, 19396, 
1940tf), P. L. Hsu (1938c, 19396, 1941a, c, d), and for illustrative material Martin (1936), 
Barnard (1935), Fairfield Smith (1936) and Wallace and Travers (1938). See also Bartlett 
(19346, 1938c, 19396, c, 1941), E. S. Pearson and Wilks (19336), Welch (19396), Lawley 
(1938) and Bishop (1939). Simaika (1941) has proved that tests based on Hotelling’s T 
and the multiple correlation coefficient are uniformly most powerful in the class depending 
on a single parameter. 

(c) The Indian school, whose contribution has not been referred to in this chapter, 
has developed some interesting work based on what is known as the D 2 -statistic. See 
Mahalanobis (1930, 1936a), Mahalanobis, Bose and Roy (19366), R. C. Bose (1936a), R. C. 
Bose and Roy (1938c), and later papers in Sankhyd. If, with two samples from p-variate 
populations, is the difference of sample means for the tth variate, the studentised 
D 2 -statistic is 

D 2 = - a iJ di d„ 

P 

where a ij refers to the reciprocal of the sample dispersion matrix. Bose and Roy have 
shown that in normal samples this has the same distribution as one of Fisher’s forms for 
the multiple correlation coefficient. The corresponding parameter for the population 

A 2 ~ - 0L ij di dj 
P 

is known as Mahalanobis’s generalised distance. 


EXERCISES 

28.1. In a four-variate normal distribution show that the correlation between the 
covariances a ia and a S4 is 

pia Pa4 + Pi4 Paa 

‘{a+A>a+A)‘P 

(Wishart, 1928.) 


28.2. For a pair of normal variates with correlation p, show r that, defining v by 


<Xi<Ta(l -P 2 )’ 

we have for the frequency function of v 

vs*.-, 
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for v > 0 and a similar expression with — v for v inside curly brackets if v < 0. Here 
K is the Bessel function of second kind with imaginary argument. 

(Wishart and Bartlett, 1033c. See also K. Pearson and others, 1929.) 


28.3. Show that if lc sets of variates af}, h = 1 ... k; i,j = 1 . . . p are each 
distributed in Wishart’s form, with sample numbers », . . . n k , then the variates 

a ij— y'.a® 

A-1 

k 

are also distributed in Wishart’s form with n = ^ (n h ). (This foDows readily from the 

hZi 

characteristic function. It is a generalisation of the additive properties of %*■) 


28.4. If a sample of n is chosen from a p-variate normal population, the variates 
being grouped into * classes x u x t . . . x Pi ; x Pi+1 . . . x Pl+Pt ; . . .; x Pl +.. Pt _ 1+ i 
. . . x p , consider the function— 


W 



where r u — 1 and is zero if the variates belong to different classes and equals the cor¬ 
relation ry if they belong to the same class. 

By considering the function 

A = W in 

show that 


k Pt 

p;(W)= nn 


<-X i-1 


1 




r <r?) 


(Wilks, 1936b. The distribution provides a test of the independence of k sets of normal variates.) 


28.5. As a particular case of the last exercise, show that if a single variate x x is 
independent of a second set x t . . . x p , then— 



and hence find the distribution of the multiple correlation coefficient when the parent 
coefficient is zero. 


(Wilks, 19366.) 


28.6. Show algebraically that Hotelling’s T is invariant under linear transformations 
of the p variates. 


28.7. If the determmantal equation (28.83) with p =» q has a double root equal to 
zero, show that for large samples the value of r corresponding to the canonical correlation 
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is given by omitting all-terms in the determinant when expanded, except those in A* and 
A®. Noting that the latter is a perfect square, show that r is the ratio of a polynomial 
in the sample dispersions to a non-vanishing function regular in the neighbourhood of 
zero. Hence that (28.107) holds when p = 0. 

(Hotelling, 19366.) 


28.8. In the notation of 28.23, if 

A =* \ o, ‘ 


00 l> 


C = 


0 


C iOL ' 



* - I I 

D = 


°atfi 


°ia 



show that the vector correlation coefficient K defined by 

1)*_0 

K AB 

and the square of the vector alienation coefficient Z defined by 

D 


Z = 


AB 


are invariant under linear transformations of the variate. Also that 

K- = ± pi Pi • • • Pp 

Z = (1 — pi) (1 -pi) . . . ( 1 - 4 ) 


where the p’s are canonical correlations. 


(Hotelling, 19366.) 


28.9. In the notation of the previous exercise, k and z being the sample values of 
K and Z, show that if the population canonical correlations are all distinct. 


var k = 



-A)* \ 
p? J 


var z 


i-1 


cov ( k , *) ** — \ KZ (1 — Pi)- 
n i=x 

In particular, when p = 2, 

var k = - { (1 - X*)« - Z (1 + K*) } 
n 

varz = — (1 -Z+K*) 
n 

cov (k, z) = -|KZ (1 +Z- K*). 


(Hotelling, 19366.) 
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28.10. In the previous exercise, with p== ? = 2, show that, in standard measure, 

k _ r„r u — r u r»* ' 

{ (1 - r !,) (1 

and hence derive a test of significance of the “ tetrad difference ” r is r»« — r 14 r M . 

(Hotelling, 19366.) 

28.11. In the notation of Exercise 28.9, show that 




+ 2 p- 


(Girshik, 1939.) 


28.12. Find the characteristic function of — log 2 , where z is defined as in the 
previous exercise, and hence show that — n log z or, to a better approximation, 
— { n — 1 — £ (p + q -f 1)} log z tends to be distributed as % 2 with pq degrees of freedom 
when n is large. 

(Bartlett, 1938c.) 



CHAPTER 29 


y 


TIME-SERIES—(1) 


29.1. A time-series, as its name indicates, is a series of values assumed by a variable 
at different points of time. We shall consider only cases where the variable is univariate 
a nd shall denote its value at time t by u f . The study of such series forms an important 
branch of statistics because the majority of types of time-variation encountered in practice 
are not of the regular functional type in which u t can be represented exactly by a mathe¬ 
matical function of t 9 but present in some degree those irregularities of a random character 
which can only be discussed in terms of probability. One of our main problems, in fact/ 
will be to isolate systematic from casual effects in the series so as to be able to study 
them separately. 


✓ 

29.2. In general it is possible to observe a time-variable at any instant, and thus 
the temporal intervals between successive members of the series nee d nnt h« the aamflu. 
Practice and theory alike* however, usually require the observations to occur at regular 
intervals, and in the sequel we shall assume, unless the contrary is specifically stated, that 
the interval from each observation to the next is the same throughout the series. As 
a matter of convenience we may take this interval as our time-unit and write the series as 


*u u 2 , u Zi ... u t , .(29.1). 

where t must be an integer . Where a series extends backwards and forwards from some 
given point which we wish to regard as origin we may write it as 

. . . u _ t , . . . u_ a, w_ l# u 0 , u l9 « a , . . . . . . (29.2) 

In this chapter and the next we shall study the way in which u t varies with t, such variation 
being in general of the stochastic type, that is to say, involving random variables. 


Some Examples of Time-series 

29.3. Tables 29.1 to 29.5 provide some examples of the kind of variation encountered 
in practice. Table 29.1 ^illustrated in Fig. 29.1) gives the annual yields per acre of barley 
in England and Wales from 1884 to 1939. Table 29.2 (Fig. 29.2) shows the human popula¬ 
tion of England and Wales at ten-yearly intervals from 1811 to 1931. Table 29.3 (Fig. 29.3) 
gives the she ep population of England and Wales for each year from 1867 to 1939. 
Table 26.4 (Fig. 29.4) gives the annual rainfall in London for each year from 1813 to 1912. 
Table 29.6 (Fig. 29.6) gives the average egg-production per laying hen in the U.S.A.<for 
each month of the years 1938 to 1940. 
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TABLE 29.1 


Annual Yields per Acre of Barley in England and Wales from 1884 to 1939. 

(Data from the Agricultural Statistics.) 


Year. 

Yield per 
acre (cwts.). 

Year. 

Yield per 
acre (cwts.). 

Year. 

Yield per 
acre (cwts.). 

Year. 

Yield per 
acre (cwts.). 

1884 

15-2 

1898 

16*9 

1912 

14*2 

1926 

16*0 

85 

16*9 

99 

16*4 

13 

15*8 

27 

16*4 

86 

15*3 

1900 

14*9 

14 

16*7 

28 

17*2 

87 

14*9 

01 

14*5 

15 

14*1 

29 

17*8 

88 

15*7 

02 

16*6 

16 

14*8 

30 

14*4 

89 

15*1 

03 

151 

17 

14*4 

31 

15*0 

90 

16*7 

04 

14*6 

18 

15*6 

32 

16*0 

91 

16*3 

05 

16*0 

19 

13*9 

33 

16*8 

92 

16*5 

06 

16*8 

20 

14*7 

34 

16 9 1 

93 

13*3 

07 

16*8 

21 

14*3 

35 

16*6 1 

94 

16*5 

08 

15*5 

22 

14*0 

36 

16*2 , 

95 

15*0 

09 

17*3 

23 

14*5 

37 

14*0 1 

96 

15*9 

10 

15*5 

24 

15 4 

38 

18*1 

97 

15*5 

11 

15 5 

25 

15 3 

39 

17*5 

_ _ 






_ 

__ 



Years 

Fig. 29.1. —Graph of the Data of Table 29.1 (Barley Yields par Aore). 


1960 
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TABLE 29.2 

Population of England and Wales at Ten-Yearly Intervals from 1811 to 1931. 
(Data from the Registrar-General’s Statistical Review, 1933, Part II.) 


i 

Year. j 

Population 

(millions). 

1811 

1016 

21 

1200 

31 

13-90 

41 

16-91 

51 

17-93 

61 

20-07 

71 

i 22-71 

81 

25-97 

91 

| 2900 

1901 

32-53 

11 

1 36-07 

21 

; 37-89 

31 

| 39-95 



mi 1831 1851 1871 1891 1911 1931 


Years. 

Fra. 29.2.—Graph of the Data of Table 29.2 (Population of England and Wales). 





TIME-SERIES 
TABLE 29.3 


Sheep Population of England and Wales for each Year from 1867 to 1939. 


(Data from the Agricultural Statistics.) 


Year. 

Population 

(10,000). 

Year. 

Population 

(10,000). 

Year. 

Population 

(10,000). 

Year. 

Population 

(10,000). 

1867 

2203 

1886 

1892 

1905 

1823 

1924 

1484 

68 

2360 

87 

1919 

06 

1843 

25 

1597 ! 

69 

2254 

88 

1853 

07 

1880 

26 

1686 

70 

2165 

89 

1868 

08 

1968 

27 

1707 i 

71 

2024 

90 

1991 

. 09 

2029 

28 

1640 ! 

72 

2078 

91 

2111 

10 

1996 

29 

1611 

73 

2214 

92 

2119 

11 

1933 

30 

1632 

74 

2292 

93 

1991 

12 

1805 

31 

1775 

76 

2207 

94 

1859 

13 

1713 

32 

1850 

76 

2119 

95 

1856 

14 

1726 

33 

1809 

77 

2119 

96 

1924 

15 

1752 

34 

1653 

78 

2137 

97 

1892 

16 

1795 

35 

1648 

79 

2132 

98 

1916 

17 

1717 

36 

1665 

80 

1955 

99 

1968 

18 

1648 

37 

1627 

81 

1785 

1900 

1928 

19 

1512 

38 

1791 

82 

1747 

01 

1898 

20 

1338 

39 

1797 

83 

1818 

02 

1850 

21 

1383 



84 

1909 

03 

1841 

22 

1344 



85 ! 

1958 

04 

1824 

23 i 

1384 




.. 

... 




1 

1 



■ Fie. 29.3.—Graph of (he Data of Table 29.3 (Sheep Population). 
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TABLE 29.4 


Total Annual Rainfall at London in Inches, for each Year from 1813 to 1912. 

(Data from D. Brunt, Phil. Tram. A, 225, 247, 1926.) 


Year. 

,Rainfall 
(inches). 

Year. 

Rainfall 

(inches). 

Year. 

Rainfall 

(inches). 

Year. 

Rainfall 

(inches). 

1813 

23-66 

1838 

21*63 

1863 

21*59 

1888 

27*74 

14 

2607 

39 

27*49 

64 

16*93 

89 

23-86 

15 

21-86 

40 

19*43 

65 

29*48 

90 

21*23 

16 

31-24(4 

41 

31*13 


31*60 0 

91 

28*15 

17 

23*65 

42 

23*09 

67 

26*25 1 

92 

22*61 

18 

23-88 

43 

25*85 

68 

23*40 

93 

19*80 

19 

26*41 

44 

22-66 

69 

25*42 

94 

27*94 

20 

22-67 

45 

22*75 

70 

21*32 

95 

21*47 

21 

31-69 « 

46 

26-36 

71 

25*02 

96 

23*52 

22 

23-86 

47 

17*70 

72 

33-86 + 

97 

22*86 

23 

24-11 

48 

29*81 

73 

22*67 

98 

17*69 

24 

32-43 6 

49 

22-93 

74 

18-82 

99 

22*54 

25 

23-26 

50 

19*22 

75 

28*44 

1900 

23*28 

26 

22-67 

51 

20*63 

76 

26*16 

01 

22*17 

27 

23-00 

52 

35-34 i 

77 

28*17 

02 

20*84 

28 

27-88 

53 

25*89 

78 

34*08 1, 

03 

38*10 \ 

29 

26-32 

54 

18*65 

79 

33*82 •> 

04 

20*65 

30 

26-08 

55 

23*06 


30*28 

05 

22*97 

31 

27-76 

56 

22*21 

81 

27*92 

06 

24*26 

32 

19-82 

57 

22*18 

82 

27*14 

07 

23*01 

33 

24-78 

58 

18*77 

83 

24*40 

08 

23*67 

34 

20-12 

59 

28*21 

84 

20*35 

09 

26*75 

35 

24-34 

60 

32*24 

85 

26*64 

10 

25*36 

36 

27-42 

61 

22*27 

86 

27*01 

11 

24*79 

37 

1944 

62 

27*57 

87 

19-21 

12 

27*88 
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TABLE 29.6 

Average Number of Eggs *per Laying Hen in the U.S.A.for each Month of the Years 1938-1940* 

(Data from Report of the Bureau of Agricultural Economics, U.S. Dept, of Agriculture, on the 

Poultry and Egg Situation, March, 1941.) 


Year. 

Jan. 

Feb. 

Mar. 

Apr. 

May. 

June. 

July. 

Aug. 

Sept. 

Oct. 

Nov. 

Dec. 

1938 

7-9 

9*9 

15*4 

17-6 

17-3 

14-9 

13*6 

11-8 

9-4 

7*6 

5*9 

6*4 

1939 

80 

9-7 

14-9 

170 

170 

14*0 

13*2 

11*7 

9-3 

7*4 

6*0 

6*8 

1940 

7-2 

90 

14-4 

16*5 

170 

14-8 

13-4 

11-8 

9-7 

7-9 

0*2 

6*8 



These series are fairly typical of the kind of material with which our theory has to 
deal. The data of Table 29.1 (barley yields) present a very irregular fluctuation, and so 
far as the eye can see (which is not a decisive test) there is no systematic oscillation and no 
regular movement in mean yields over the period. By contrast, Table 29.2 (human popula¬ 
tion) shows a relatively smooth movement without apparent oscillation. Table 29.3 (sheep 
population) combines a general decline in numbers with marked oscillatory effects which, 
though not perfectly regular, appear to be systematic to some extent. Tables 29.4 and 
29.6 exhibit an oscillatory effect which is definitely seasonal for the latter and much less 
Tegular for the former, neither indicating a variation, in the periods covered, of the average 
values about which the series oscillate. 

■ * , - . « 

f 29.4. It must not be overlooked that our method of determining the values of the 
series at fixed equal intervals of time may suppress evidence of oscillatory movements 
which have a period equal to those intervals or to some sub-multiple of them. Suppose, 
for instance, that there was a systematic oscillation in the EngUshpopulation expressible 
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by a harmonic component with period of exactly 10 years, or exactly 5 years, or exactly 
3J- years. Clearly, by observing the series at 10-yearly intervals we should never find any 
evidence of this effect, for it would contribute exactly the same amount to each observation, 
without oscillation. In the population case, of course, we have collateral evidence to 
indicate that no such oscillation exists, but where nothing is known of the series otherwise 
we can never exclude the possibility of a period exactly equivalent to our time-interval. 
Sometimes, in fact, we know that it is there, and choose our interval so as to exclude the 
oscillation from consideration. For instance, in our sheep population we know that there 
is a seasonal effect within the year, which is not brought out in Table 29.2 because the 
sheep census is taken on June 4th each year ; and again, in the rainfall data of Table 29.4 
we have taken as representing the year the whole rainfall within the year, knowing quite 
well that rainfall is seasonal to some extent, even in London. 

29.5. A general survey of these and similar series suggests that {the typical time- 
series may be regarded as composed of three parts :— 

(а) a trend, or long-term movement 

(б) an oscillation about the trend of greater or less regularity ; 

(c) a “ random ”, “ irregular ” or “ unsystematic ” component. 

It is customary to regard the series as composed of these elements superposed one on 
another ; that is to say, we consider the movement of the series as the sum of three dif¬ 
ferent components which may be generated by different causal systems.^ Particular series, 
of course, need not exhibit them all. That of Table 29.2 (human population) seems 
to be almost entirely trend, with perhaps a small unsystematic residual, whereas that of 
Table 29.6 (egg production) appears to be entirely oscillatory, and very regularly so. 
But some series at least exhibit all three. 


29.6. (.T he primary problem of time-series analysis from the statistical viewpoint igp 
t o isolate the three factors for individu al studvj) and in this chapter and the next we shall 
be mainly concerned with various methods of carrying out the necessary analysis. Before 
proceeding, however, we must look a little more closely into the reality of the effects which 
we are investigating and the basis on which we assume that the analysis is legitimate. 

>/ 29.7. (perhaps the easiest component to understand and to remove from the series 


is the seasonal effect. This is a fluctuation imposed on the series by a cyclic phenomenon 
external to the main body of causal influences at work upon it^) The oscillation in egg- 
production in Table 29.6, for instance, reflects the rhythm in the reproductive process 
which is found among birds in virtue, ultimately, of the fact that the earth goes round 
the sun once a year. (Strictly speaking, we ought to confine the word " seasonal ” to those 
effects which are annual iq period ; but where no confusion is likely to arise we can apply 
the same word and the same ideas to any phenomenon generated by strictly periodic natural 
processes,*)such as “ spring ” and “ neap ” variation in tides or daily variation in tempera¬ 
ture.' (JVe must, however, be careful about extending the notion of seasonality to phenomena 
which ate not demonstrated beyond reasonable doubt to depend on strictly periodic stimuli?) 
For instance* it would be going too far, in the present state of our knowledge, to speak of 
sunspqt variation as seasonal in this sense, and much too far to speak of seasonality in 
crop^yf^ld^ as determined by sunspots, even if the relation between the two were estab- 
lished. ^ shaU leturn to this point below when defining what we mean by a “ cycle ” 
as ^gtmet from an u osemation ”, 
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y 29.8. As we noted in 29 . 4 , the seasonal effect may already be removed from the 
series by the way in which the data are specified. Where we ourselves have any choice 
in the determination of the data, we may eliminate seasonality in the same way, namely, 
by selecting for measurement of the series a point of time which is fixed in relation to the 
year, such as June 4th for the agricultural returns of England and Wales, or by averaging 
over the year, or (what is much the same thing) by cumulating the series over the year, 
as for instance with rainfall data. 


29 . 9 . (jThe concept of trend is more difficult to define. Generally, one thinks of it 
as a smooth broad motion of the system over a long term of years, but “ long ” in this con¬ 
nection is a relative term, and what is long for one purpose may be short for another.} For 
example, if we were examining rainfall records over a hundred years a slow rise from the 
beginning of the period to the end would be regarded as a trend ; but if we possessed records 
for two thousand years (and the rings in some of the giant redwood trees give an index of 
climatic conditions for periods of this order) the rise over a particular century might appear 
as part of a slow oscillatory movement, so that any inference from the “ trend ” in a par¬ 
ticular century to the effect that the weather was likely to continue becoming wetter and 
wetter might be quite false. What inference we should make in practice would depend 
on what we were trying to do. If we were engineers designing a water-supply system and 
wished to provide against droughts of reasonable extent, we might perhaps assume that the 
trend would last as long as our works and proceed accordingly ; but if we were attempting 
to study climatic changes over the face of the earth for geological periods of time we should 
accept the continuance of the trend with the greatest reserve or, more probably, should 
reject it on collateral grounds. 


29 . 10 . However long a series may be, we can never be certain, and often not even 
reasonably sure, that a trend in it is not part of a slow oscillation, except of course when 
the series has terminated (as might, for instance, be the case if we were considering the 
lengths of reigns of the Roman Emperors). l£ln speaking of a trend, therefore, we must 
bear in mind the length of the series to which our statement refersN Perhaps it would be 
more accurate to speak of slow or quick movements rather than oi trend and oscillation, 
but even so the distinction between the two would remain a matter of subjective judgment 
to some extent. 

29 . 11 . ^When seasonal variation and trend have been removed from the data we 
are left with a series which will present, in general, fluctuations of a more or less regular 
kind .) Fig. 29.1 represents the kind of series we obtain, since it has no components of 
trend or seasonality. The question then arises, is this residual series systematic in the 
sense that its values can be represented as a function^of the tittle ? . Or, on the other hand, 
are the values random in the sense that they could occur* in the bbserved order K b y random 
sampling from a homogeneous population ? Or again, is there some possibility intermediate 
between complete functional variation and complete randomness ? (The search for syste¬ 
matic effects in residual fluctuation gives rise to several techniques of analysis* the object 
of which is to detect whether any part of the< series is subject to law, and therefore predict¬ 
able, and whether any part is purely haphazard. The former part we shall call systematic, 
and it will be referred to as an “ oscillation ” (not a “ cycle ”, which is a very special case 
of an oscillation, as we shall see later). The remainder of the series we shall call the unsys¬ 
tematic component, and refer to its movements as “ random ”J When a series is a mixture 
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of oscillation and random movement it will not cause any inconvenience to refer to the 
up-and-down movement generally as fluctuation before we have analysed it into its con¬ 
stituents ; that is to say, we may speak of fluctuation without prejudice to the possibility 
of detecting oscillatory movements in it. 

In this chapter we study -trend and random residuals. In the next chapter we shall 
deal with oscillatory and cyclical components. 

29.12. The logician or the economist who wants to be difficult can always maintain 
that, although any series can be separated into our three specified components as a matter 
of mathematical or statistical analysis, the results throw little or no light on the causal 
influences at work to produce the series. To such a critic we have to concede, I think, 
that in carrying out the analysis we have at the back of our minds the strong possibility 
that the three elements are due to independent causal systems. If he refuses to accept 
this view—and some economists do -we can only invite him to produce a better statistical 
method. 

Possibly the reader will feel, on reaching the end of Chapter 30, that we have not been 
wasting our time, and that our methods do throw light on the way in which time-series 
behave. If not, he should consult some of the references and see whether he finds them 
statistically more satisfying. 

Determination of Trend 

29.13. \jt is an essential part of the concept of trend that the movement over fairly 
long periods is smooth. This means that wc can represent the trend component, at least 
locally, by a polynomial in the time element t. Thus, given the series u t > we may, in the 
first instance, seek for some polynomial 

u t - a 0 -f a , t 4- a 2 t' 1 4- . . . 4- a p t*> . . . . (29.3) 

which will give an account of the trend movement. By taking p great enough we can, of 
course, obtain as close a representation as we like to a finite series ; and how large we 
take p is a matter for decision in particular cases. 

If the polynomial is fitted to the whole series by least squares, it evidently gives the 
curvilinear regression line of u t on the variable l.Jf This method would then lead to the 
fitting of regressions in the manner of Chapter 22, and we need not repeat here what has 
been said on the subject in that chapter. In Example 22.7 we did, in fact, fit a quartic 
to the population data of Table 29.2 and found a good fit. 

29 . 14 . It is, however, clear that to obtain a satisfactory trend-curve for data such 
as that of Table 29.3 (sheep population), we should have to take a polynomial of rather 
high order. This may appear somewhat artificial and in any case the coefficients of such 
a polynomial, being based on high-order moments, would be very unstable from the sampling 
viewpoint. A more practical objection, though by no means an unimportant- one, is that 
if we add another term to the series, as for example if we are keeping an annual series up 
to date from year to year,'the. work of fitting has to be done afresh each time. Moreover* 
the trend-line may be i^jleeted throughout,its length. When, therefore, the series has no^ 
very obvious trend such as that of Table 29.2 it is more convenient to use the simpler 
meth^^lbliBcHbad below. 
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Moving Averages 

29 . 15 . ( An alternative to finding a polynomial which will represent the whole series 
is to determine a polynomial which will represent a part of it, and to use different poly¬ 
nomials for different parts. The simplest method, and one which forms the basis of the 
majority of methods of trend fitting, is to take the first m terms (m being chosen at will),' 
fit a polynomial of order p , not greater than m — 1, to them, and use that polynomial to 
determine the value in the middle of its range ; then to repeat the operation with the m 
terms from the second to the (m \- l)th, and so on, moving on one term at each stage. 
Unless other considerations require it, we take m to be odd, so that the middle point of 
the range corresponds to a value which is actually observed !\ Otherwise the middle point 
falls half-way between two observed values, or we have to use some value of the fitted 
polynomial other than the middle point, which results in a loss of useful symmetry. 

29 . 16 . (Suppose, then, that the number of terms is chosen to be odd and is denoted, 

with a slight change of notation, by 2m f 1. Without loss of generality we may denote 
the terms by u m , a . . . u 0 , . . . u m „ lt u m . Tf we choose to fit to them a poly¬ 

nomial of the pth order (29.3) we may, in the usual way, determine the coefficients by 
least squares, i.e. solve the equations 


3 w 

da 2J (u ‘ _ a ° ~ ■ ' ■ ~ a »)*" °> 0 • • • P • • ( 29 - 4 ) 

which will give us equations typified by 

£(t'u,) -a a Z{V) -a x Z(V + ') . . . - a t , £ &+») = 0. . . (29.5) 

Now the sums £ (V) are functions of m only. Thus, if we solve (29.5) for a* we shall find 
an equation of the form 

®o = Co + Cl +C|« • • • + c im+l • • (29.0) 

where the c’s depend on m and p, but not on the u’s. 

Now u„ assumes the value ® 0 at t = 0 and hence this value, as given by (29.0), is the 
value we require for the polynomial. As wo see, this is equivalent to a weighted average 
of the observed values, the weights being independent of which part of the series is taken. 
Thus our process of fitting a trend-line consists of determining the constants c (which 
depend on m and p and therefore give us a twofold element of choice) and then calculatSng, 
for each consecutive set of (2m + 1) terms in the sdries, a value given by (29.6). If the 
terms are u x . . . u 2M+r , the calculated value will correspond to { — m + x. There will 
be no values corresponding to the m terms at the beginning and the m terms at the end.' 


Example 29.1 ft 

Suppose we have a series and wish to fit a curve which best approxiittstee to lets of 
seven points; and suppose we regard a cubic as providing a satisf&ctbry approximation. 
/’What are the weighted of the moving average ? . » 

We have m ==jfo nd p m 3, and o' r polynomi al i ^ i 


» a. -J- Oi t -p 


.. t-W 


y 




= ^ {— 2^_ ; , + 3m_ 2 -I- 6 u_i + 7m 0 + 6^! + 3m, — 2m,}. 

We may write this conveniently as 

2 \ [~ 2, 3, 6, 7, 6, 3, - 2] y 

or, when symmetrical formulae are used, as in the present case, by 

[-2, 3, 6, 7 . . . 1, 

denoting the middle term by heavy type. 

To take a simple illustration. Suppose the series is given by the following values:— 

t : 1 2 3 1 ( 5 6 7. 8 9 10 

u,: 0 1 8 27 V 64 125^ 216 343 512 729 


We have, for the trend value at t — 4, 

«o— (—2 x0) + ( 3 x ,) + (6X8)+( 7 X27) +(6x64) +(3 X 125)—-(2 X 216) }= ^{567} 

= 27. 

Similarly, at £ — 6 we find 

«. = 2 \{ (- 2 X 8) + (3 x 27) + ... -(2 X 512) } 

= 125. 

In both cases the trend-value is equal to the actual value of the series, and this obviously 
must be so when we note that we are fitting"^ cubic to the series 

u t — (t — l) 3 . - 

It will be observed that in this example we should have obtained the same value for 
a 0 if We fitted quadratic* instead of cubics; and generally the case p odd includes the 
case of the next lowest (even) value of p > so that we need not give separate formulae for 
even p. 


. 2V.17a Writing [Jfc] for the value of a 0 calculated in the above manner for an average 
of ik suec^saive terms, we find the following formulae up to p = 5. The reader may care 
to for himself as an exercise. 
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Quadratic and Cubic 



o, [6] 

36 12 ’ 

17, . 

• •] 





[7] 

A' [-»’»• 

6, 7, 

• •] 





[9] 

2 4i r - 21 ’ 14 

39, 

64, 59, ... ] 



j 


111] 

55 

44, 69, 84, 89, ... ] 





[13] 

IT3 [ ~ 11 ’ °’ 

9, 16 

, 21, 24, 25, ... ] 



• 


[15] 

1T05^ 78 ’ - 

13, 42, 87, 122, 147, 162, 167, 

. . . 

] 



[17] 

323 [ - 21 ’ - 

6, 7, 

18, 27, 34, 39, 42, 43, . . 

• ] 




[19] 

2261 1 ,36 ’ 

- 51, 

24, 89, 144, 189, 224, 249, 

264, 

269, . . 

• ] 


[21] 

3^59 [ " m >- 

- 76, 

9, 84, 149, 204, 249, 284, 309, 324 

329, . . 

•] 





Quartic and Quintic 




P) 

1 

231 

[5, - 30, 75, 

131, . 

• •] 




[9] 

1 

429 

[15, - 55, 30, 

135, 

179, . . . ] 




[11] 

1 

429 

[18, - 45, - 

10, 60, 120, 143, . . .] 




[13] 

1 

2431 

[110, - 198, 

- 135 

110, 390, 600, 677, . . . ] 




[16] 

1 

46,189 

[2145, - 2860, 

- 2937, - 165, 3755, 7500, 10,125, 11 , 063 , . . 

•] ' 

[17] 

1 

4199 

[195, - 195, - 

- 260, 

117, 135, 415, 660, 825, 

883, 

• • 

.] 

[19] 

1 

7429 

[340, - 255, - 

- 420, 

- 290, 18,405, 790, 1110, 1320, 

1393 , . . 

•] 

[21] 

1 

260,016 

[11,628, - 6460, - 

13,005, - 11,220, - 3940, 

6378, 

17,655, 



28,190, 36,660, 42,120, -4,0Ud« . . ] 


(29.7) 


(29.8) 


29.18. Several methods have been proposed to simplify the arithmetic of fitting 
a trend-line by moving averages, the large numbers in some of the expressions in (29.7) 
and (29.8) involving considerable labour in straightforward application. The simplest, 
perhaps, is that of iterated averages. 

Suppose we take an average of sets of four with equal weights—a very simple process 

r 
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—and then another average of the same kind of that average. If the primary series is 
uj, the result of the first operation will be to give a series 

V l = 7 (Ui 4- M, + u 9 + u 4 ) 

4 

'3. v 2 — i (u 2 + u 3 + u 4 4- its), etc., 

4 

and that of the second operation to give 

w i = - (v t h v t 4- v 3 |- v A ) 

= l4j + 2u2 + 2?/ ° + ? M* • • (29.9) 

We may write this symbolically as 

{-*[1, 1, 1, Ij 1* - ^.[1, 2, 3, 4 . . .], . . . (29.10) 

or, reserving the symbol ~ | A"] for a simple arithmetic mean of k terms, as 

^ L+P - [1, 2, 3, 4 . . .].(29.11) 

Now compare the weights of the average derived in Example 29.1 for fitting a cubic 
to seven points. Reduce?) to unit divisors we have for the weights of the latter 

- 0*0952, 0 1429, 0*2857, 0*3333 ... 
and for the weights of (29.9) 

0*0625, 0*1250, <)*187.>, 0*2500 ... 

The two are not identical, but they follow the same sort cf course and it might be possible 
to regard the latter as an approximation to the former. (We shall derive better approxi¬ 
mations presently, but this will serves for purposes of illustration.) Now the iterated 
summation resulting in (29.9) is much easier to carry out than the single weighted averaging 
process of Example 29.1. Generally, if we can find averages with simple integral weights, 
preferably unity, which will, in conjunction, give approximations to the more complicated 
weights of a single average, it is usually easier to use the iteration process. 

29.19. In the notation of finite differences, write 

Aui — M/ 4 -i — u t . . . . . (29.12) 

E a t -- u t vl — ( 14 - zl) u t .... (29.13) 

du t = ttf+i — u, .(29.14) 

We have, for the second “ central ” difference <5 2 u t , 

d 2 u t -- (u f+l - u t ) - (u t -%,_,) 

= (E — 2 4- E~ l ) u t .(29.15) 

E exp (2 i<j>) ...... (29.16) 

6 a - E - 2 4- E~' j 

— exp {2i<f>) 4- exp (— 2i<f>) - 2 r , '***'{. 

= - 4 sin 2 <f> .. ! ' . (29.17) 


Writing 

we find, symbolically, 
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Then 


£ m = jr (E*u„) 

<-*—w V 

r * 

= <j 1 + 2 (COS 2 j<£) 




since 


the terms in sin 2y^vanish, 


u 0 


Thus 


_ sin (2fn + l)4 
sin <f> 


1 rJ -, 1 sin lcd> 

i Wu --tis4 u ‘ 




*<** - >> . , 


TN 

** ,C 




. (29.18) 


3 ! 


sin 2 <h + 


, Jfc* - 1 

M “ + 2»-3! 


-f 


2 4 5 ! 


- ] 2 ) (k 2 - 3 2 ) . 4 

1 

— r ,- sm 4 <f> . 

. . ^ M 0 

' 32) 5X + • . • 

. (29.19) 


This interesting formula, gives the arithmetic average in terms of the middle term u„ and 
its central differences. 

If now our series is approximately repres ented b y a cubic, so that fourth differences 
vanish, we have ~ ' 

i [*] «. - Mo + ** “■ 1 .(29.20) 

and this equation will in any case be true up to 1 !»n*d differences. Similarly, for two iterated 
averages v,e have, to the same order, - 

. u « — «o + (k'f + 14 — 2) d 2 u 0 . . . (29.21) 

and so on. We will use these results to derive two formulae in very general use by actuaries 
for “ graduating ” a series, a process which is very similar to that of fitting a trend-line. 


Example 29.2. Spencer's 15-point Formula 

Consider three successive averages with equal weights 

^WW[«]».= «. + 24 ( 4 * - 1 + 4* - 1 + 5* - 1}<5*m 0 
= «0 4- 7 <5 2 Wo- 

4 4 

We then have, to third differences 


u. = A)MH5](i 

Substituting for <5 2 the formula [1, — 2, 1], as given by (29.15), we find 


u. 


320 


[4]* [ 5 ] f~ 9, 22, - 9), 


(hfiferen^^T^oan^mp^ify^th^nu^ri^^coeffic^ta 0 *!? ^ 0 * 10 * 8 **** 

•V * J umerical coefficients tq someextent. Let us 
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add to the factor [—9, 22, — 9] a term 
is [—3, 3, 4, 3, — 3], giving 

u ° = ~ [4] 2 [5] [ 


320 1 


33 l = [ — 3, 12, 18> 12, — 3]. The result 

* 

3, 3, 4, . . .]. t 


This is Spencer’s 15-point formula, 
in full being 



3, 


- 6 , - 


It covers sets of 15 consecutive terms, the weights 
5, 3, 21, 46, 67, 74, . . . ] H / 


Example 29.3. Spencer's 21-point Formula 

In a similar way we find 

± [5] 2 [7j = 1 + 4<5 2 , 

1 iO 

giving, to third differences, 

*. = [S] 2 \r\ (1 - 4<5 2 ) «. 

- ~ [5J 2 t?] [- 4, 9, — 4] u„. 

We now add to the factor [— 4, 9, 4] the expression 

- 3<5* - l- 3, 12, - 18, 12, 31 + [ J, 3, 74, 10, 

giving 

Wo = } \ r> r«]• [7j [ - h 0, b b b 0, - il 

^ 350 f6?m[ bO, 1,2, 

This is Spencer’s 21-point formula. 

29.20.(a few practical points arising in the application of the foregoing formulae 
are worth mentioning. 

(a) The order in which the iterations are carried out is of course immaterial, as the 
reader can easily verify. It is therefore more convenient, as a rule, to carry out the more 
complicated operations first, while the numbers being handled remain small.^ For instance, 
in applying the Spencer 15-point formula we should carry out the moving average 
[ — 3, 3, 4, 3, — 3] first, then apply the simple average [5], and then the two averages 
of four. This does not apply if the series is short, inasmuch as there are fewer of the final 
than of the initial operations. 

(fe)^The use of a moving average of extent 2k 4- 1 involves the absence of k terms at 
the end and k terms at the beginning of the trend-series. If the original series is short the 
loss may be serious, and this effect sometimes restricts considerably the extent of the 
average which we are able to apply. 

(c) It is possible to remedy the deficiency at the ends of the series by special formulae, 
but the values so derived have less reliability than those of the main trend-line, and on. 
the. whole it seems better to accept the loss of 2k terms unless trend-values for the beginning 
and end of the series are really essential. 


- 74, 3, - i] 


y 


'4 

*4 





378 


TIME-SERIES 


(d) As yet we have given no guide as to the choice of most suitable values of m and p. 
In practice we do not usually require to fit curves of degree higher than five, and often 
a cubic is sufficient^as is assumed in the Spencer formulae. (There is greater elasticity in 
the choice of m, but the point mentioned in (b) above requires m to be as small as possible, 
consistent with other requirements.^ We shall see later in the chapter that the variate- 
difference method gives some further guide as to p, and that certain effects of trend-elimina¬ 
tion on random elements bear on the extent determined by m. 

(e) There is a voluminous literature on trend-fitting which appears to me out of pro¬ 
portion to the importance of the subject. It is not difficult to pursue inquiries on the 
above lines to the point of extreme apparent precision and great mathematical complexity, 
and perhaps such work is valuable where the series is fairly smooth and not disturbed 
seriously by sampling variation or superposed random fluctuation. But many of the 
series encountered in statistical practice will not bear the weight of great refinement in 
trend-fitting. The student will probably find that a knowledge of fitting by moving 
averages will be sufficient for all ordinary and many extra-ordinary purposes. 


The Effect of Trend-elimination on Other Components 

29.21. In Table 29.6 we have applied the Spencer 21-point formula to an artificial 
series obtained by adding a random element to a cubic. Specifically, 


- (t - 26) + 


(t - 26) 3 +e,$s 


(29.22) 


The component e t was taken from tables of random numbers and consists of samples from 
a population, in which all integral values from 0 to 99 are equally frequent. The various 
columns of the table illustrate the process of fitting, and we may note in passing that for 
aperies as short at) this it is convenient to leave the more difficult summations to the last 
there are substantially of them. 

Now we know that the Spencer formula will fit a cubic exactly, so that when we sub¬ 
tract the trend from tike original series we ought to eliminate the systematic constituent 
entirely and be left with our random component, except in so far as we have rounded off the 
systematic element to the nearest unit. A comparison of columns (2) and (9) in Table 29.6, 
remembering that the latter includes an element 49-5 equal to the mean of the random 
component, shows that we do not do so. The reason is not far to seek.’ The moving 
average has acted on the random element itself and determined a trend-line in it. 

The results of applying the Spencer 21-point formula to the random element z t are 
shown in column (II). We should expect that if the method were perfect the values in 
this column would be 49*5, the mean of e t , apart from irregular sampling effects ; btit 
not only do the observed values deviate from this mean, they do so systematically, the 
values having a small oscillatory movement which is shown as part of the tren^L 

v/f29.22. This effect can assume considerable importance, particularly if we are elimina- 
tingf'trend so as to concentrate attention on oscillations. We proceed to examine it more 
closely. 

f Suppose that we have a series composed of the sum of three parts, a trend <f>i (t), an 
oscillatory term <f> 2 (<), and a random element <f> t (t), so that 

Uf — <f>i -f "h ..... (29.23) 
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TABLE 29.6 

Series given by Equation (29.22) with Trend-Line determined by a Spencer 21-point Formula. 
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If we determine the trend by a moving average, denoted by an operation 2\ then clearly 

Tu t = T<j> x + T<f> 2 + T<f> . .(29.24) 

Let us now suppose that our method of determining trend is perfect in the sense that 
T4>i — <f> i- Then, on subtracting (29.24) from (29.23) to eliminate trend, we find 

*, - Tu t = (0 2 - T<f > a ) + (<f» - Tfa). . . . (29.25) 

The point of present interest is that the terms T<f> 2 and T<f> 3 in (29.25) may distort 
the genuinely oscillatory parts of the residual series and induce spurious oscillatory move¬ 
ments. ^ 

29*23. Consider the simple case when <f> 2 is a sine term, sin (a + Xt), t being integral. 
Since 


*■ 

i 

i 


^ sin (a + A<) = sin {« + \ (k -f 1) A}, . 

sin -&A 


(29.26) 


a simple moving average of k consecutive terms will result in a sine series of the same 
period and phase as the original, but with the amplitude reduced by the factor 


1 sin \kX 
1c sin \X * 


(29.27) 


Iteration q times will reduce the amplitude by the qth power of this factor. 

Thus the term T<f> 2 will be small if k is large, q is large, or if \kX is a multiple of n y 
that is, if the extent of the moving average is a period of the oscillation. But if X is small 
and kX is small the amplitude is reduced very little and </> 2 — T<f> 2 will largely disappear, 
i.e. the moving average will partially obliterate the term in <£ a . In this case, kX being 
small, the extent of the moving average is small compared with the period of the harmonic 
term, that is to say the oscillation is a slow one. This result is what we should expect. 
A slow oscillation is treated as a trend by the moving average and eliminated accordingly. 
Generally, the moving average will emphasise the shorter oscillations at the expense of the 
longer ones. Furthermore, if the extent of the average is slightly greater than the period, 
the term (29.27) may have a negative sign, and consequently the difference from the trend 
may somewhat exaggerate the true oscillations. 

It is not so easy to exhibit the precise effect of the moving average when the weights 
are unequal and the terms are not harmonic, but evidently the same kind of situation is 
apt to arise. 

.24.( Now consider the effect of a simple moving average (that is, one with equal 
weights) on the residual element </> 3 which we will suppose to be a random element e t with 
For the term T</> 3 we have 


variance v. 


T<f> 


1 ^ 

-[**) 


(29.28) 


where [%k] is the greatest integer which does not exceed \k. Consecutive values of e t are 
independent, but consecutive values of T<f> z are not; for T<f> 3 (a) and Tfa (b) have 
k — (a — b) values of e in common and are correlated if a — b < k. Thus jbhe series T<f> % 
will be much smoother than <f> z , and if we proceed to further averi^pgs will become smoother 
still.) We have had an example of this effect in Table 29.6, and shall n^eet further 
examples below. 
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29.25. (jh!L effect of taking a mo ving average of a random series will then be to 
generate an oscillatory series, p rovided th at th e weights are such as to give a positive 
correlation between successive members of the generated series, a condition which is always 
realised in moving averages employed for trend-fitting. We shall call this t h e Slutzky- 
Yule effect, after the two statisticia ns who (independently) studied it in detail. 

The generated scries is not regular in the cyclical sense," that is to say its peaks an d 
t roughs do not recur at equal intervals of time, and the amplitudes of the oscillations vary 
c onsiderab ly. Nevertheless such oscillations present a striking resemblance to the kind 
of movement which is found in practice, particularly in economic time-series^ and we shall 
consider them in more detail in Chapter .‘10. For our present purposes we require to con¬ 
sider ha w f * ar the process of trend-elimination itself may generate such effects in order 
to be sure Mat oscillatory movements in a treml-free series have not been put there, so 
to speak, bv our own arithmetical processes." ; " 

■si 


29.26. F 'or this purpose we shall consider the period and variance of a series gen¬ 
erated by the Slutzky-Yule effect. 

Since the peaks and troughs do not recur at equal intervals there is no quantity which 
we can conveniently call the length of the oscillation. There will, in fact, be a distribution 
of lengths. We may define as the mean length either the mean period from peak to peak, 
or that from trough to trough ; but this raises some difficulties as to whether we are pre¬ 
pared to admit as periods small ripples on the main undulation. 

Recognising its somewhat arbitrary character, we shall take as our measure of oscilla¬ 
tory length the mean distance between k: uperosses that is to say the mean distance 
between points where the series changes sign from negative to positive or “ crosses the 
or-axis ”. Suppose the series is generated by a moving average with weights a x ... a k 
of a random variable which is normally distributed with variance v. Then the probability 
that 


k 

% = X a J F J <0 .( 23 - 28 ) 

j -l 
k 

and u k+l ^ Qj £j ,, > 0, . . . . . (29.30) 

i *1 

i.e. that the generated series changes sign from negative to positive, is the proportional 
frequency of 


dF " M«« 


AM-1 

II'i 


j~ l 


dt x . . . chi 


A’4-1 


. (29.31) 


k k 

between the hyperplanes ^ Ej ~ 0 and ^ a j e j+ \ ~ equal to the angle 

fzi j- 1 

between these two planes, which is given by 

k- 


cos 0 = - 

1°; 


4 1 


j = l 


Hence the meali distance between uperosses is 2;r/0, where 0 is given by (29.32). 


. (29.32) 
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29.27. In a similar way, the probability that 

u k+ 1 — u k < 0 . . . • . . . (29.33) 

u k — %-i > 0,.(29.34) 

that is that u k is a peak of the series, is the angle between the two hyperplanes 


n 

Z! a J e i+i-£ a J E j=° • 


. (29.35) 


and is given by 


cos 0 X — 


K K 

£ a i e i =° • 


(«2 — dl ) + («3 “ « 2 ) (#2 — «l) + • • - 

+ ( a k " a k-\) i a k~ 1 ~~ a k- 2 ) “ a k ( a k *“* a k~ l) 


. (29.36) 


. (29.37) 


+ (® 2 — ci \) 2 +...+«*} 

Thus the mean distance between peaks is 27t/0 x . The same formula obviously applies to 
mean distance between troughs. 


29.28. If we wish to exclude “ ripples ” of a certain length d from consideration 
we may inquire for the probability that (29.35) and (29.36) are satisfied in conjunction with 

Uk > u k+d. ...... (29.38) 

This is evidently the area cut off on the unit sphere by the three planes (29.35), (29.36) and 

k. k 

a J E j ““ ^7 a j e j+d “ • • • • (29.39) 

7-1 

If the angles between the planes are A, B and C this area is A + /J + C — 2n = 0 2j say. 
The mean length between peaks, ripples excepted, is then 4 n/0 2 . 

Example 29.4 

In Table 29.7 we show 480 terms of a-series of random numbers which can take integral 
values from 0 to 19, together with a moving sum of fives of a moving sum of threes. 
Fig. 29.6 shows a portion of the derived series graphically. There are 474 terms of the 
smoothed series. 

The mean value of our series is 15 x 9-5 — 142-5. The number of upcrosses will be 
found from the table to be 23, the first between the 19th and 20th term of the smoothed 
series, the last between the 459th and the 460th. The mean distance between upcrosses 
is then 440/22 — 20 units. How does this compare with the mean-distance given by 
“ normal ” theory ? 

The weights of the graduation are [1, 2, 3, 3, 3, 2, 1] and from (29.32) we have 

cosO = < l - x *! + !*£ 3 > +i*xJ> 

1 * + 2 * + . . . + 1 * 

34. 

= ~ = 0-9189 
37 

0 = 23° 14'. 

Hence the mean distance = = 15-5 units. 

23-233 
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TABLE 29.7 

Series of 480 Terms of a Rectangular Random Series e and a [5] [3] smoothing S . 
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Fig. 29.6.—Graph of the Last 117 Terms of the Series 8 of Table 29.7. 
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The observed mean distance is 20 0 units, but this is based on rectangular variation, and 
we are, perhaps, entitled to expect some difference from normal theory. For rectangular 
random variables, values distant from the mean occur more frequently, and it is not sur¬ 
prising to find oscillations in the series which do not result in upcrosses. 

The number of peaks in the series will be found to be 02, the first at the seventh term, 

the last at tha 406th. Hence the mean distance between peaks is = 7-5 units. From 

61 

formula (29.37) we find 

COS 0 1 =1 0y = 48° 11'. 

6 


Thus the theoretical mean distance is 


300 

48-187 


= 7-5 units, in good agreement with experi¬ 


ment. It will be observed that several of the distances between peaks are due to very 
small ripples. 

From a number of experiments Dodd (1939a) concluded that series generated from 
rectangular material conformed fairly well to normal theory. 


29.29. Let us now examine how the variance of the induced oscillation compares 
with the variance of the original random series. 

The sum of k random elements with variance v has variance kv and its mean has 
variance v/k. It does not follow that a simple moving average has a variance 1 /k times 
that of the random element, because of correlations between successive members in the 
derived series. If the original series was . . . e n the derived series is, with weights 
tti . . . a k . 


di fc*i f a 2 e 2 \- . . . -t- a k e k = #/ 1# sav^ 

(l i £ 2 r ^3 (■•••+ a k F k f 1 ~ tya 

a l F v-k+l + e »-fe+2 + • • * f‘ a k F n = Vn-k+l, 


(29.40) 


The expected value of the sum of these values is zero since the expected value of e may be 
taken to be so. Since there are n — k + 1 terms we have for the variance 


1 

n — k + 1 




. (29.41) 


The expected value of this, since the e'& are independent, is 

— + j E {2 1 (»/ 2 ) } = E (t] 2 ) = (aj + at -f . . . a%) v. . . (29.42) 


In particular, if the a’s are all equal to 1 /!c, the expected value of the variance is v/k. ,This 
gives us the average reduction in the variance. 

If a simple average of extent k is iterated q times the weights are the successive 
coefficients in 


1 

k‘ l 


(1 + X + X 2 +■ ... + x fr- 1 )". 



The sum of squares of these coefficients is the coefficient of xf l(k ~ 1] in 


ifcfl 


(1 + X + X 2 + . . . + = 


(1 - a*)2« 

h (i 


. (29.43) 
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and this gives the average reduced variance for a simple average of k iterated q times. 
The following are the values of the reducing factor for some of the values of k and q :—- 

<1 


! 

1 

2 1 

3 ! 

4 

5 

3 i 

0-33 

0-23 

019 j 

017 

0-15 

4 ! 

0*25 

017 

on 

012 ! 

Oil 

5 j 

0-20 

j 0*14 

Oil 

010 

009 

6 ; 

017 

j Oil 

009 

008 

007 

7 i 

014 

0L0 

008 | 

007 

006 


l 


Evidently the result of the first moving average is to generate a series with a much 
lower variance than that of the original random element, but the second and succeeding 
iterations do not reduce the variance further to the same extent. In the case k = 7 the 
first averaging reduces the variance to one-seventh, but the next three reduce it only by 
a further half. 

29.30. To apply such results in practice we require an estimate of the variance of 
the random element in the original series. If this is available we can estimate the variance 
of the generated series and also, from 29.26, the mean distance between uperosses or 
between peaks. If then our residual series, after the elimination of trend, showed an oscilla¬ 
tory movement with this variance and these mean-distances, within sampling limits, we 
could not conclude that the oscillatory effect was real. It could have been induced by 
our method of eliminating trend. 

In the present state of knowledge it is not possible to assign permissible limits of 
sampling variation by relation to standard errors in the usual way. Whether any particular 
effect is significantly different from the values of the series generated from the random 
element remains, therefore, a matter of subjective judgment to some extent. The sampling 
problems involved are formidable, but there does not seem any reason why they should ( 
not be capable of explicit solution. This field of study awaits the attention of the theorist. 

i 

Example 29.5 

For the data of Table 29.3 (sheep population of England and Wales) trend was elimi¬ 
nated by a simple average of nines, the resulting residuals being shown in Table 29.8. 
A glance at the series suggests some sort of oscillatory effect, since the signs of terms cluster 
together. By the methods of the next chapter the effect may be brought into greater 
prominence. The data themselves, however, indicate a mean-distance between uperosses 
of about 8 or 9 years, and actual calculation gives a variance of 8474. Can this be due 
to the operation of our trend-elimination on a random element in the original series ? 

For the mean distance between uperosses due to a simple nine-point average we have 

cos 0 = 0 — 27° 16', 

360 

and the mean distance is ^27 = 13 2 a PP rox ™ ate ^y* Th* 8 i® considerably in excess of 

our observed value, but not sufficiently so to reject outright the possibility we are examining. 

Since, however, the variance of residuals is 8474 this must, to have been generated 
from a random series by a simple average of nines, derive from a random element with 
a.s.—VOL. ii. c c 
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TABLE 29.8 


Residual Values of the Sheep Series of Table 29.3 after Elimination of Trend by a Simple 

Nine-Point Moving Average. 


Year. 

Residual 

(10,000). 

i 

Year. 

1871 


176 

1893 

72 

— 

112 

94 

73 

+ 

50 

95 

74 

+ 

141 

96 

75 

+ 

60 

97 

76 


20 

98 

77 

+ 

12 

99 

78 

f 

82 

1900 

79 

•h 

130 

01 

80 ! 

— 

14 

02 

81 ! 


166 

03 

82 j 


179 

04 

83 1 


84 

05 

84 1 

| + 

38 

06 

85 

! + 

97 

07 

86 ! 

! + 

8 

08 

87 

; 

5 

09 

88 

1 

105 

10 

89 i 

— 

99 

11 

90 

4- 

35 

12 

91 

-b 

159 

13 

92 

‘ + 

167 

14 


Residual v Residual | 

( 10 , 000 ). * ear * ( 10 , 000 ). : 


•b 

34 

1915 

+ 

19 

— 

103 

16 | 

+ 

128 

— 

104 

17 

4* 

97 

-- 

15 

18 i 

+ 

69 

— 

23 

19 ! 

— 

29 

4- 

17 

20 j 

— 

174 

+ 

71 

21 ! 

— 

107 

4 

35 

22 ' 

— 

142 

+ 

16 

23 i 


109 

— 

27 

24 

— 

23 

_ 

32 

25 

4 

60 

— 

49 

26 ; 

-b 

121 

-- 

61 

27 1 

4 

94 

— 

52 

28 

— 

25 

— 

24 

29 

— 

90 

+ 

68 

30 ; 

— 

75 

4 

141 

31 i 

4- 

72 

+ 

119 

32 

+ 

152 

4- 

66 

33 1 

+ 

112 

-- 

52 

34 

— 

64 

— 

117 

35 

— 

87 

— 

61 





variance 76,266. An estimate of the variance of the random element in the original series, 
obtained by the variate-difference method which we describe below, was only 350 approxi¬ 
mately. Making every allowance for sampling effects, we cannot do otherwise than reject 
decisively the possibility that the residual oscillation is spurious in the sense of having 
been induced into the data by the effect of the elimination of trend on a random element. 

£ \ 29.31. We may summarise the foregoing discussion of trend-elimination as follows :— 
^(o) The conception of a trend as a <c smooth ” or “ regular ” movement is equivalent 
to the supposition that trend can be represented, at least locally, by a smooth mathematical 
function and in particular by a polynomial in the time-variable. 

(6) Certain series can be treated on lines formally equivalent to regression analysis ; 
but a more generally applicable procedure is to represent the trend by a moving 
parabolic arc. 

(c) The moving arc of best fit in the least-squares sense gives values which are deriv¬ 
able from a moving average of the data. The weights of this average are to some extent 
at choice, according to the extent of the average and the closeness of fit required in the 
moving arc. 

(d) A moving average of extent k sacrifices (k — 1) terms, in the sense that the derived 
series is (k — 1) terms shorter than the original series. If the series is short it is usually 
desirable to keep this loss to a minimum, that is, to keep the extent of the average as 
short as possible. 
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(c) A moving average may distort genuine oscillatory effects, in general exaggerating 
the shorter variations at the expense of the longer ones, a nd may induce spurious oscillatory 
phenomena by its act ion on rando m residuals. For harmonic components the effect is 
minimised Dy TSRRg the average as^simpIeTwith extent equal to the period of the com¬ 
ponent. For random components the effect is minimised by making the-sum of squares 
of weights in the average a minimum, i.e. by using a simple average. / 


K 29.32. In the theory of time-series there are very few rules which can be laid down 
without a good deal of proviso and caveat. It will be evident from the foregoing that there 
is no golden rule in trend-fitting which can be applied irrespective of individual circum¬ 
stances. If we desire to get a close fit to the data we must use a parabola of fairly high 
order, which involves a moving average with weights which are far from equal. This, 
however, increases the danger of obscuring the true oscillations in the residuals. In 
most practical cases it is necessary to strike a balance between conflicting requirements 
by intuitive judgment as to the appropriate moving average to use. 

Variate-difference Method 

29.33. We now proceed to consider the random constituent of a time-series. From 
the very nature of random variation we cannot expect to derive any formula, however 
approximate, which will measure the random component directly at any given point of 
the series. The best we can hope to do is to determine the non-random components and 
I to obtain a random residual which is left unaccounted for by those components ; and even 
this, as we shall see in the next chapter, is not a very strong hope when oscillations appear 
in the series. 

On certain assumptions, however, we may determine the variance of the random 
component and hence obtain a general idea of its magnitude and importance. Suppose 
that the systematic part of the series can be represented, at least locally, by a polynomial. 
Then successive differencing of the scries will gradually eliminate the polynomial element 
but will not reduce the random clement correspondingly. As we proceed with the differ¬ 
encing, the random element becomes more and more predominant until finally the syste¬ 
matic component is negligible. Hence we can determine effectively the variance of the 
random component in the differenced series, and by a simple calculation derive an estimate 
of that in the original series. 


29.34.'\ Consider the differencing of a random series e t . We have 
A e t — e l+l — . 


e t\r- 2 


+ (-1 r 


(29.44) 

(29.45) 


Without loss of generality we may suppose that the mean value of e t is zero, and thus 


E (A r e t ) - 0. 


(29.46) 


Hence 


var ( A r e t ) 


E(A T E t y 

* K -(0 


«?+r + 


e t+r -1 + 


A+r-X + 
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The sum in curly brackets is easily evaluated from the consideration that it is the coefficient 
of a? in (1 + x) T (x + l) r , that is, equals ^ ^ Hence 

var (A r e t ) = v .(29.47) 


We may then derive an estimate of v by writing 

(A r s t ) 


v — 1 


CO 


(29.48) 


It is to be noticed that we use the second moment about zero, not the observed variance 

of A T s t , since the mean is known to be zero. This shortens the arithmetic to some extent. ) 

/ 2r\ ' 4 

The factor 1 J for r — 1 to 10 has the following values :— 


r 

(?) 

■/(?) 

1 

2 

0-5 

2 

6 

0-166,667 

3 

20 

0-05 

4 

70 

0-014,285,7 

5 

252 

0-0*3,968,25 

6 

924 

0-0*1,082,25 

7 

3,432 

0-0 3 ,291,375 

8 

12,870 

0-0*77,700,1 

9 

48,620 

0-0*20,567,7 

10 

184,756 

0-0 s 5,412,54 


29.35.^ Basing itself on equation (29.48) the method of variate-differences proceeds 
as follows : We difference the series .once, find the second moment about zero of the result¬ 
ant and divide by 2 ; we then difference again and find the second moment about zero, 
dividing in this case by 6 ; and so on. Tf the successive estimates of v decrease, we con¬ 
tinue with the differencing. Thele will, in general, come a point when they cease decreasing 
and remain constant within sampling limits (which may be rather wide). At this stage 
we may suppose that we have eliminated the systematic element in the original series. 
The final estimate gives us an estimate of the variance of the random element in the original 
series, and the order of the difference to which we have had to go will give an indication 
of the degree of the polynomial representing the systematic component. J 

Example 29,6 

Let us apply the variate-difference technique to the series of Table 29.6. We know 
from the method of constructing the series that the systematic part ought to be completely 
eliminated after the third differencing, and also that the random part consists of an element 
with variance 833 approximately. In fact, the random numbers from 1 to N have a 
variance (N 2 — 1)/12 and N in this case is 100. The actual variance of the random element 
in Table 29.6 is 843. 
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TABLE 29.9 

Differences of the Series u t of Table 29.6. 


t i 

U t | 

i 

A 1 . 

AK ! 

i 

1 

-96 

— 6 

67 

2 

-90 

73 

- 88 

3 

-17 

15 

36 

4 

-32 

— 21 j 

- 69 

5 

-11 

48 ! 

139 

6 

-59 

-91 

- 95 

7 i 

32 

4 

-- 2 | 

8 | 

28 j 

6 

46 ! 

9 ! 

22 

-40 

— 52 

1 ° j 

62 

12 

- 36 ! 

11 ! 

50 

48 

125 

12 i 

2 

-77 i 

163 

13 j 

79 

86 , 

167 

14 

- 7 

8 L 

- 70 1 

15 

74 

-11 

81 ! 

16 

85 

70 

51 1 

17 

15 

19 

84 i 

18 

- 4 

65 

- 87 

19 

61 

22 

- 16 

20 

39 

38 

88 

21 

1 

50 

- 48 i 

22 

51 

... o 

" 7 | 

23 

53 

5 

... I ! 

24 

48 

6 

— 26 

25 1 

42 

32 

97 

26 j 

10 

65 

-103 

27 

75 

38 

13 i 

28 

37 

25 

109 

29 

12 

-84 

-no 

30 

96 

26 

- 14 

31 

70 j 

! 40 | 

62 

32 i 

30 

-22 1 

10 

33 

52 

i —12 

- 42 

34 

64 

j 30 

122 

35 

34 

j -92 

-160 

36 

126 

; 68 

67 ! 

37 

58 

l 

39 

38 

57 

! ---38 

- 58 1 

39 

95 

; 20 

103 

40 

75 

-83 

- 142 

41 

158 

! 59 

58 

42 

99 

1 

62 1 

43 

98 

-61 

-- 64 ! 

44 

159 

3 

27 | 

45 

156 

-24 

| — 3 ■ 

46 

180 

*-21 

17 i 

47 

201 

-38 

1 - 56 | 

48 

239 

18 

67 ! 

49 

221 

-49 

70 

50 

270 

21 


51 

249 

i 

i ••• 1 


A 3 . 

A 4 . ! 

r 

d«. | 

j 

A •. j 

155 | 

279 1 

508 j 

1050 

-124 ! 

-229 

- 542 i 

-1297 

105 

313 

755 ; 

1524 

-208 

-442 

- 769 

-1141 

234 

327 I 

372 ; 

271 

- 93 ' 

- 45 ! 

101 

361 

- 48 

-146 j 

- 260 

- 229 

98 

114 

31 

625 

- 16 

145 ' 

594 

1601 

-161 

-449 ! 

-1067 

-2252 

288 

618 t 

1185 i 

1978 

-330 

-567 : 

- 793 

- 876 

237 

226 

83 

- 159 

11 

143 

242 

137 

-132 

- 99 

105 

551 

33 

- 204 

- 441) 

- 055 I 

171 

242 

209 

- 64 j 

- 71 

33 

273 

690 j 

- 104 

-240 

417 

- 629 i 

136 

177 

212 

216 

41 

35 

4 

175 j 

— 6 

- 31 

- 179 

- 650 

25 

148 

471 

1110 

-123 

-323 

- 639 

- 975 

200 i 

316 

336 ! 

41 J 

-116 ; 

- 20 

295 i 

925 ! 

96 

-315 

- 630 

- 965 

219 

315 

335 j 

207 

- 96 

20 ! 

128 j 

316 

- 76 

1 ” 148 | 

— 188 

32 

72 

40 1 

- 156 ; 

- 798 

32 

! 196 

642 

1597 | 

-164 

| 446 

955 j 

1719 

282 

509 

764 

950 ' 

-227 

- 255 

- 186 

1 141 j 

28 

- 69 

I - 327 

j - 991 | 

97 

258 

664 

1 1515 | 

- 161 

-406 

- 851 

-1492 j 

245 

445 

i 641 

1 707 ! 

- 200 

—196 

• 66 

; 28i j 

4 

130 

- 347 

! - 685 

126 

217 

338 

509 

91 

-121 

j - 171 

- 314 

30 

50 

143 

432 

- 20 

- 93 

! - 289 

- 745 j 

73 

196 

456 

1 

... . 

-123 

i -260 

1 

| 

137 


i ... 

i * * ’ i 

! ... 1 

. . . 

. .. 

! .!! 

| ... j 
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Table 29.9 shows the series and the differences up to A*. For the sums of squares 
in the various columns Sj corresponding to A 1 , we find— 


S x 107,541 
S t = 318,115 

<S’ s = 1,033,513 
S 4 - 3,445,308 
8 S = 11,720,069 
S e = 40,548,844 


To obtain second moments we divide by 51 — j and then, to obtain the estimate of v. 


by 



We find the following - 


7 


Estimate. 


1 

2 

3 

4 

5 

6 


1075*41 

1082-02 

1076-58 

1047-21 

1011-05 

975-20 


Curiously enough, the estimate for j -= 2 is higher than that for j — l and there is 
little difference between the various estimates. In the ordinary way we should have 
concluded that the systematic component was adequately represented by a polynomial 
of order 1, that is to say a straight line, and that the residual random element had a variance 
of about 1000. 

The reader must not be surprised to find discrepancies of this kind between theory 
and experiment in short series ; and the discrepancy is not, in fact, as big as it seems. 
The variance of the original series is 6272-61. The mean square of the first difference, 
divided by 2, is 1075-41, so that about five-sixths of the variance has been eliminated by 
the first differencing, and the method indicates, quite correctly, that the greater part of 
the systematic element is linear. The random element is rather large compared with the 
non-linear systematic terms, and the latter have got caught up in it—the series is too short 
for the variate-difference method to disentangle them. Consider, for instance, the cubic 
term r * }li (t ~ 26) 3 . In the original series this varies in value from — 156-25 to + 156-25. 

First differences reduce it to (t — 26) 2 , varying from 18-75 through zero to 18-75, 

whereas the random element is increased in range from 0 to 198. Already the systematic 
term is being swamped by the random element, and a slight degree of accidental correlation 
between the two can easily account for the increase in the mean-square of second differences. 

The matter may be put in a slightly different way. Suppose that, relying on the 
variate-difference method, we regarded the data as represented by a linear equation plus 
a random residual. If we fitted a straight line by least squares and examined the residuals, 
we should probably find very little evidence of departure from randomness. This repre¬ 
sentation would differ from the mode of construction of the series, but it would be a possible 
method of construction. Only the failure of the representation to conform to further 
terms of the series would reveal its weakness. 
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29.36. The variate-difference method thus provides a kind of lower limit to the 
degree of the polynomial which will represent a series locally or generally. There remains 
for consideration the question as to what sort of differences between successive estimates 
of v can be regarded as chance effects, in order to decide when the value has reached a 
stationary level. The sum of squares Sj is a constant factor times the second moment, 
but as its members are correlated among themselves we cannot use the variance of the 
second moment to test its significance. Further, Sj and S J+ , are correlated. We proceed 
to derive the sampling variance of their difference, the somewhat complicated formulae 
being due to Anderson (1914). 


• 29.37. Write 


*»-(;)■ 


2 . 


(29.49) 

(29.50) 


(29.51) 


Then we have, as in (29.42), 

1 WlAruYi ( ft o I' •••)/*. _ 

AH .) - w - 

where is the variance of u. Further 

E ( A T u)*--^E[ {b„ « r+ , - hi u r + 6* u r _ l l) r b r Mi}* 

•I- {6.fW» - Ai*r+1 + h * “r 

+ {h««» r h, u„ _ 2 - . . . I- (— 1 ) r b r u n _ T )-y. 

Consider first of all the terms in this which result in fourth powers of u. They will 
derive from 

E {bl u; +l + h? u\ \- . . . + K u{ + K uU 2 i <, i • • • d- b l 4 + • • • 

+ b'u Un + b'l 1 + • • • + b'r ^ 

= E {6* « -|- + (bl -I- b'i) («.;_! + ui) + (h* + bf -f b\) (m£_ s h 4) + • • • 

+ (b'l -i- h'l + • • • 4-l) (K-r+l + M r) + (M I- h; + • • • t- K) 

(ui-r + 4-r-l +•••-'■ 4+l)Y . 

Writing now ,,, , . 

Bl = (6;i) a -I- (bl -}- b'ir + • • • + (AS l- b( -(-■■■ + K- 1 ) 

^:-(«+*!+ • ■ • +*»*“(T)*- • ■ • 

we see that the term in E (u*) is 

{Al (n - 2 r) + 2B'l } E (u 4 ). 

The only other term appearing from (29.51) will be of type E (u? up, l 9* m. If the reader 
will write out the expansion of (29.51) he will find that the coefficients are 
terms of 

A) » (6, b } + hi bj +i + • • • + b r -j b r ) 2 = ( f * .) • • ( 29 - 56 ) 


(29.52) 

(29.53) 

(29.54) 

(29.55) 

er 
in 


and 

Bj=(bob j ) t +(bob J +b 1 b J+1 y i + • • • + (h« b j +b l h ;+1 + • • • + b r-)-i V-i)*- • ( 29 - 67 ) 
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The expression for E (A r u)* reduces to— 

(n — 2r) A% E ( u*) + 4 { (n — 2r + 1) A\ + (n — 2r + 2) A\ + . . . 

+ A* (» -2 r + r)}E («? <) + 2 B* E (««) 

-I- 8 {B? + B\ + . . . + («? «*). (29.58) 

( 2/* \ 2 

J and subtracting 

/4, we find the sampling variance of the estimate of v, The expression can, however, be 
simplified to some extent. Putting 

r-V y V « / r- 2 y V - y VO r “ 3 / r \2 / r \2 

+ 


-crcr ■ 


(29.69) 


we find, after lengthy algebraic rearrangement, 
var «• -«/1 

(»-’■>(;) 




(» " r) 


(?)' 


2/1 |_(*r) 

te‘ 


+ _ c*_. 

n — r 


2 (n — r) 


r < 


(29.60) 


If terms of order (n — r) “ a can be neglected, this reduces to 


- 3/*I L ( 2r) 

(?)' 

or, using the Stirling approximation to factorials, 


- f • + 
n — r 


2/4 


{, u _ ;^| + nl y/(2rn) }, 

n — r 


(29.61) 


(29.62) 


which is a fair approximation to (29.61), being within 3 per cent, for r as low as 6. 

When the population of values of u is normal, // 4 — 3/4 vanishes and the formula 
simplifies accordingly. 

29.38. In a similar way it may be shown that 

r * «u. i 


cov 


(n - r) 


( 2 ;) 


_ l 1 4 — 3/^| fl 


2t; 


n — r 


+ -«-< 
n — r 


ri *i 

Or;, 1 )- 


2» — 2r — 1 _ r + 1 
”r “l" 2 (»T-T~- 1) 


. (29.63) 
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where 


r~ i 


From (29.60) and (29.63) we can determine the variance of the difference of 

*V an( j 

<»-<) «„ , 

The general formula is complicated, but for normal variation, large n and r > 6 we have, 
analogously to (29.62), 

f S r S r 


var « 


(n 


-O 


(n — r 



(3r-| 1 )\/(&w) 

2 (2r |- 1)* (n ~ r • 1) 


(n 



(29.64) 


The arithmetic application of the formulae has been facilitated by the preparation of tables 
of the constants involved. Reference may be made to Tintner (1940) who gives tables 
prepared by himself, Anderson and Zavcoff. 


Example 29.7 

For the data of Table 29.3 (sheep population) an application of the variate-difference 
method up to the tenth difference gave the following results : - 


r 

1 

2 

3 

4 

r> 

6 

7 

8 
9 

10 


S r/i ‘ 2 r) {n ~ r) 

3468 

1442 

854 

629 

518 

448 

401 

371 

357 

347 


The values here are falling steadily from r — 1 to r -- 10, but very slightly towards 
the end. From (29.64) for r ~ 6 we have for the variance of the difference, 80-7 approxi¬ 
mately and for r 10, 25-8 approximately. It appears that the reduction in variance 
at r = 10 is losing significance, and that a moving arc of degree 10 would be sufficient to 
eliminate the systematic component. It does not, of course, follow that the trend-line 
must be of this degree, for we may not want to eliminate the oscillatory movements in 
the trend-line. 


29.39. The variate-difference method will clearly not eliminate systematic effects 
such as periodic terms with very short period. Consider, for instance, the series 1,-1, 
1, — l, etc. The first differences give us a series 2, — 2, 2, — 2, etc., second differences 
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4, — 4, 4, — 4, etc., and so on. The variance of the series of rth differences is, neglecting 
effects due to the shortness of the series, 2 2r times that of the original, and the quotient 


when this is divided by ^ ^ ^ tends to 

2 2r (r !) 2 
* (Sr !) 


V nr 


and so increases without limit. In such a case we cannot obtain an estimate of the variance 
of any random element which may be present. 


NOTES AND REFERENCES 

References to the fitting of polynomials are given at the end of Chapter 22. For the 
moving average see Whittaker and Robinson’s Calculus of Observations and the books by 
Macaulay (1931) and Sasuly (1934). 

Attempts have been made to use trend-lines for purposes of forecasting, and even to 
measure the standard error of a forecast—see Schultz (1930) and a discussion in Davis 
(1941). The methods proposed appear to me theoretically unsound and in practice they 
lead as a rule to such wide limits of error as to be of doubtful value ; but this is a personal 
opinion and the less sceptical reader may care to consult Davis’s book and to follow up 
the references given therein. 

For the effect of moving averages on random variables see Yule (1921) and Slutzky 
(19376), the latter being an English version of a paper published in Russian many years 
earlier. See also Dodd (1939a, 1941a). Slutzky proves an interesting theorem—the 
theorem of the sinusoidal limit— to the effect that repeated moving averages of certain 
kinds applied to random series generate a sine-curve. 

For the variate-difference method see the book by Tintnej; (1940), a very thorough 
practical account with useful tables. The more important earlier memoirs are those by 
Anderson (1914, 1923, 1926), “ Student ” (1914), Morant (1921), and K. Pearson and 
Cave (1914). 


EXERCISES 

29.1. Show that in the formulae of equation (29.7) and similar formulae of higher 
orders the sum of the weights is unity. 

29.2. By evaluating the solutions of (29.5) determinantally show that a parabolic 
curve of second or third order giving a graduation 

a t u_ t + #(*.- 1 ) . . . + a 0 u Q + . . . + a, i u i 

has 

= 3 J n * + ~ l Lz 

* (in - 1) (2 n + 1) (2 n + Sj' 

29.3. Show that the weights in the Spencer 21-point formula are 

— [- X, - 3, - 5, - 5, - 2, 6, 18, 33, 47, 57, 60, ... ] 

oOU 

and that if it is applied to a random series the variance of the resultant is about one-seventh 
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of the original series about the same reduction as would be given by a simple moving 
average of sevens. 


29.4. Show that Macaulay’s 43-point formula, 

9 “ ti2] [S] [5]* 1^-1 - 1, 0, 0, 0. 0, 0, 0, 1, ... J, 

has weights 

96()0 18, 3 °’ 4 °’ 45, 28, ~ 8 > ~ 60 - “ 122 > - 178, - 205, - 190, - 127, 

- 6 , 163, 360, 562, 760, 928, 1050, 1127, 1156, . . .] 

and that it reduces the variance of a random series about as much as a simple average 
of nines. 


29.5. Take a random series of, say, 200 terms and determine “ trends ” by moving 
averages - [9], —[9 ] 2 and —- [9] 3 . Compare the mean distances between peaks and 
upcrosses with the theoretical values based on normal theory. 


29.6. If e t is a random series, show that the correlation between successive members 

h 

of A k e t for long series is — — ^ and hence tends to — 1 as k increases. Hence show 

that the signs of successive terms in A k u t tend to alternate, where u t is the sum of a random 
element and a systematic element representable by a polynomial; and verify by reference 
to Table 29.9. 


29.7. By eliminating d 2 from (29.19) show that, for a cubic curve, an accurate trend- 
line is given by 



and generalise this result. 

(Cf. J. A. Higham, J. Inst. Act. (1882-5), 23, 335 ; 25, 15, 245.) 



CHAPTER 30 
TIME - SERIES—(2) 


30.1. The present chapter is devoted to a discussion of oscillatory effects in time- 
series. We shall suppose that our series is stationary , i.e. has no trend, either because the 
original data contained none or because trend has been removed by one of the methods 
described in the last chapter. Our typical series will then fluctuate round some constant 
value which we may usually, without loss of generality, take to be zero. We shall assume 
that there is a prior possibility that part of the variation at least is random. This, indeed. 


TABLE 30.1 


Trend-free Wheat-Price Index (European Prices) compiled by Sir William Beveridge for 

the Years 1500-1869. 


(From Beveridge, J921.) 
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is necessary if our results are to have any practical application, for most of the series 
encountered in practice have some element of irregularity, however small. 

30.2. Four examples of the type of series under consideration have already occurred. 
The table of Example 21.11 (page 126) gives the deviations from a simple nine-year moving 
average of the yields of potatoes in tenths of tons per acre in England and Wales for the 
years 1888-1935. Table 29.1 (Fig. 29.1) gives the annual yields of barley in cwts. per 
acre in England and Wales for 1884-1939, no nine-year elimination of trend having been 
carried out in this case. Table 29.4 (Fig. 29.4) gives rainfall data at London over the 
century 1813-1912. Table 29.5 (Fig. 29.5) gives egg-production per laying hen in the 
U.S.A. 


TABLE 30.2 

Marriage, Rale in England, and Wales: Deviation from a Simple 11-Year Moving Average 

for the Years 1843-1896. 

Units l in 10,000. 


Year. 

Marriage 

Rate. 

Year. 

Marriage 

Rate. 

Year. 

Marriago 

Rate. 

1843 
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1861 
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- 12 
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Tables 30.1 and 30.2 give two further examples. The first is a famous series of trend- 
free wheat-price indices compiled by Sir William Beveridge and extending over 370 years, 
a phenomenal length of time for economic series. The second is the deviation from a 
simple 11-year moving average of marriage rates for the years 1843-1896. 

Oscillation and Cycle 

30.3. We will now attempt to define more closely the sense in which we use the 
words “ oscillation ” and “ cycle ”. It is particularly important to exercise great care in 
the use of an accurate nomenclature because a great deal of the literature on this subject 
suffers from confusion due to loose wording. 
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By a cyclical component of a time-series we shall mean one which is a strictly periodic 
function of thp^time, that is to say, for which there exists a period co such that 

\S | u t = u t+» — «<+= • • • = u l+ka =.( 30 . 1 ) 

whatever the value of t. The periodic functions which we shall consider in particular are 
the sine and cosine functions. If the series can be represented as the sum of a cyclical 
component and a random constituent, or by a cyclical component alone, we may speak 
of it as a cyclical series. 


30.4. If the series is not random it must move with more or less regularity about 
the mean value, and we shall then speak of it as oscillatory . The oscillatory movement 
may be in part due to random elements but must not be entirely so. A cyclical series is 
oscillatory, but an oscillatory series is not necessarily cyclical. 

An oscillatory movement may be the sum of two or more cyclical components. Con¬ 
sider, for instance, the sum of two periodic terms 


u t ~ sin 


2 nt 


+ sin 


2 nt 
co 2 


If (o 1 and co 2 are commensurable there will be numbers, and in particular a smallest number j 
co, which is an exact multiple of both of them. This is clearly a period of the series, j 
But if co t and co 2 are not commensurable there will be no period of this kind and the sum \ 
will be oscillatory but not cyclical. 


30.5. It may be felt by the reader that we could reasonably extend the use of the 
word “ cyclical ” to cover series which are the sum of cyclical terms ; but the danger of 
doing so is that within certain limits any series can be represented as a sum of harmonic 
terms, even if it is not itself oscillatory, in virtue of Fourier’s theorem. Admittedly such 
a representation, to be exact, must in general consist of an infinite series of terms and is 
valid only in a certain range, but in practice^ comparatively small number of terms often 
gives quite a good approximation. We do not call a function a polynomial because it 
can be expanded in powers of the variable by Taylor’s theorem ; and correspondingly 
we shall not call it cyclical because it can be expanded as a sum of harmonic terms by 
Fourier’s theorem. On the whole it seems safer to avoid the word “ cyclical ” for series 
which consist of\a finite number of cyclical terms. 

30.6. For our present purposes the main significance of the distinction we are attempt¬ 

ing to make is that in a, cyc lical series the maxima and minima* apart) from disturb ances 
due to the~sup erposition of a random^em^ at equal inte rvals of time and are 

therefore predictable for a long way i^ so longTTn fact r as the commin ution 

of the system remains imchangeH. ^l^ogc illatory senes T on the other hand, j hft 

from" peaOo geaKf trough to" trouglToFup to u peross , are not equal, but vayv very 
considerably^ Similarly, in~ tHe~<Oscillatory series the amplitudes of the movements may 
yary very substantially, whereas in a c yc^o sl series^th ey should be constant (again, except 
in so far as superposed random eleme nts disturb the m). ^ 

30.7. Now the time-Series observed in practice are very rarely cyclical as we have 
defined the term. The only case among those cited at the beginning of the chapter in which 
there appears to be any cyclical movement is that of egg-production per hen in Table 29.5. 
The far more usual case is that of varying amplitude and period from peakto peak Orupoross 
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to upcross. We shall therefore begin our study of oscillatory movements by considering 
the kinds of scheme which can give rise to the observed phenomena; and then we shall 
examine methods of deciding which of the possible schemes should be chosen as the 
hypothetical representation in particular cases. 

Test#, for Randomness 

30.8. The first stage, when confronted with a fluctuating stationary series, is to 
examine whether the fluctuations are purely random. Tests of randomness are easy to 
find, and in fact the random series is the happy hunting-ground of the worker whose interests 
lie mainly in the mathematics of the direct theory of probability. We have considered 
some tests which are appropriate to the study of oscillatory movement in 21.43 to 21.46.fL/ 
Others which have gained popularity are based on the distribution of “ runs ” and on the 
correlation between successive members of the series. The reader will have no difficulty 
in composing others. All these tests are based on the non-parametric case, so that the 
alternative hypotheses are not usually brought specifically into view. We cannot there¬ 
fore apply the general theory of Chapters 26 and 27 to determine “ best ” tests, and in the 
present state of knowledge are forced to be content with less definite ideas . So far as 
ease of app lication goes, the tests of 21.43 and 21.44 seem to have decided advantages, 
though they may be somewhat insensitive. Thg method of serial correlatio n, to which we 
r efer b elow, give s a useful alternative in doubtful cases,. In the sequel we shall suppose 
that before proceeding to search for systematic movements we have satisfied ourselves by 
one or more of these tests that such movements exist. 

30.9. We shall consider three schemes which can account for the typical oscillatory 
movement usually observed. 

(а) Moving Averages. —We have already seen in Chapter 29 that a moving average 
of a purely random element can generate an oscillatory series with all the required properties 
of varying amplitude and mean distances—the Slutzky-Yule effect (29.25). Fig. 29.fi 
illustrates the kind of oscillation which may arise. It is at least possible that some of the 
observed oscillations in time-series may be generated in this way; and in fact Slutzky 
(1936) has given an interesting example in which a part of his series generated by the 
moving average happens to agree very closely with an observed series. 

( б ) Sums of Cyclical Components. —We may attempt, by Fourier analysis or the more 
general harmonic analysis, to represent the oscillations as the sum of a number of cyclical 
components. This is the classical approach. 

(c) Ajitoregression Equations. —If a series is constructed by the recurrence formula 

u i+ 1 =f(u t , u t _ v . . . u t _ k ) + e t+1 , • • • (30.2) 

where/is a mathematical function and e a “ disturbance ” function which may be a random 
variable, then under certain conditions the generated series is of the required type. We 
sAfill. consider in particular the series 

Ui +2 = — au ^i but + .... (30.3) 

where a and b are constants and e is random. 

Table 30.3 (Fig. 30.1) shows a series of type ( 6 ) in the simplest case where only one 
cyclical component is involved, together with a random residual. Tabl e 30.4 ( Fig. 30.2) 
shows an autoregressive series constructed from random numbers by the formula 

Uf+ 2 = 1*1 1 — 0*5 Uj + 6 / 4 . 2 * .... (30.4) 
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TABLE 30.3 


Values of the Series u, 


7tt 

= 10 sin —- + E t where e t is a Rectangular Random 
5 


Range — 5 to +5, rounded off to Nearest Unit . 


Variable with 


Number of ] 
Term. j 

1 

Series. 

Number 

Term. 

i ! 

3 

21 

2 ! 

8 

22 

3 

6 

23 

4 

2 

24 

5 i 

! - 4 

25 

0 

- 7 

26 

7 

- 91 

27 

8 

- 91 

28 

9 

- 10 

29 

10 

- 1 

30 

11 

8 

31 

12 

! 7 

32 

13 

6 

33 

14 

4 

34 

15 

- 3 

35 

16 

- 10 

36 

17 

1 - 11 

37 

18 

i - 15 

38 

19 

1 - 4 

39 

20 

: 4 

40 


— 

-- 

1 

Series. 

Number of 
Term. 

Series. 

| 

11 

41 

5 i 

13 

42 

12 

10 

43 

7 I 

6 

44 

5 

- 5 

45 

3 

- 8 

46 

- 2 

- 12 

47 

- 121 

- 10 

48 

- 121 

- 7 

49 

- 8 

0 

50 

- l ! 

1 

51 

11 i 

8 

52 

13 i 

13 

53 

12 

7 

54 

7 i 

4 

55 

5 i 

~ 9 I 

56 

- 1 I 

- 9 1 

57 

- 6 

- 6 

58 

- 14 ' 

- 4 

59 

- 8 

- 2 

60 

1 i 










Values of Series. 


TESTS FOR RANDOMNESS 


401 


TABLE 30.4 

Values of Series u l+2 = 1*1 u l+l — 0-5 u, + e l+2 where e <+2 is a Rectangular Random 
Variable with Range — 9-5 to 9-5, rounded off to Nearest Unit. 


Number 

Value of 

of Term. 

Series. 

' 

1 

7 

2 

6 

3 

- 6 

4 

- 4 

5 

3 

0 

- 4 

7 

- 5 

8 

- 1 

9 

10 

10 

1 10 

11 

6 

12 

- 4 

13 

- 4 

14 

! - 7 

15 

- 2 

16 

6 

17 

17 

18 

24 

19 

17 

20 

4 

21 

1 

22 

; - 5 


Number 

Value of 

of Term. 

Series. 

23 

1 - 4 

24 

- 5 

25 

- 9 

26 

- 4 

27 

- 4 

28 

3 

29 

! 9 

30 

4 

31 

- 8 

32 

- 6 

33 

- 3 

34 

- 2 

35 

0 

36 

- 1 

37 

- 3 

38 

3 

39 

- 1 

40 

- 8 

41 

- 3 

42 

- 8 

43 

- 10 

44 

- 10 


Number 

Value of 

of Term. 

Series. 

45 

■ -■ 

- 13 

46 

1 

47 

6 

48 

4 

49 

11 

50 

15 

51 

9 

52 

8 

53 

! 4 

54 

! - 1 

55 

! 4 

56 

! 7 

57 

11 

58 

0 

59 

1 

60 

i o 

61 

- 5 

62 

- 11 

63 

- 8 

64 

1 ~ 3 

65 

! 5 



Fio. 30.2.—Graph of the Values of Table 30.4. 


▲.s. —vol. n. 


D D 
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30*10. It is quite possible that theoretical reasons may suggest other sohemes for 
study as the subject progresses. For instance, we might wish to consider series defined 
by differential equations, on the analogy of the similar equations determining oscillations 
in physical phenomena such as vibrating strings or electrical discharges. Something has, 
in fact, already been done in this direction. We shall, .however, confine our attention 
to the three schemes indicated above, and particularly the second and third. 

30.11. On the face of it, an observed series exhibiting the typical movements in 
amplitude and period might be due to any one of the three schemes or even to a combination 
of them. We require, in the first instance, some objective criterion for deciding which of 
them is applicable in particular cases. Inspection of the primary data, though useful, is 
quite an unreliable guide in making a decision on this point, particularly if the series 
is short. Experience seems to indicate that few things are more likely to mislead in the 
theory of oscillatory series than attempts to determine the nature of the oscillatory move¬ 
ment by mere contemplation of the series itself; and yet this is the method, if one can 
^dignify it by such a term, which has perhaps been most widely used in the past. 

« 

r trial Correlation 

^ * * 
v 30.12. Suppose our series of values is u x . . . u n . Let us form the product-moment 

correlation coefficient between successive terms, i.e. 

r ^ c o I .Kj_%d-1) 

1 (var Uj var u j+1 )* 

There will be ( n — 1) pairs entering into the correlation, and the variances of Uj and u j+l 
differ only in the fact that the first relates to the terms u Xi u 2f . . . u n _ 1 and the second 
to the terms u 2j u B , . . . u n . The coefficient r x is called the serial correlation coefficient 
of Jhe first order, or more briefly the first serial correlation.* 
jj/ More generally, let us define a coefficient of order k : 

( r - COV %+ft) / 

k (var Uj var u j+k )* 


(30.5) 


(30.6) 


n 


, n-k / n-k \ / n-k \ 

”' + *> - <» - *>■ (£”>) (£*») 


r -k = r k \ 


BRT 


(30.7) 


(30.8) 


By convention we define 

? & 

' 30.13. In practice we often require to calculate serial correlations up to r w and for 

long series as many as 60. The arithmetic is tedious but may be systematised so as to 
reduce labour, which arises chiefly in the determination of cross-products forming the 
covariances. 

The series of n terms is written down vertically on each of two slips of paper, the spacing 
- being equal on the two slips. This can very conveniently be done on a Burroughs tabulator 
with a split keyboard, the series being recorded in duplicate and the resulting strip cut up 

k * It is sometimes convenient to confine this expression to values calculated from samples, the 
1 corresponding values for the infinite series being termed “ autocorrelations ” and denoted by a Greek p. 
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the middle. To calculate the first product-sum we pin the slips so that the first term 
on the right-hand slip is opposite the second term on the left-hand slip, and hence so that 
the jth term on the right is opposite to the (j + l)th on the left all the way down. For 
most series the differences of two terms which are opposite can be obtained mentally by 
subtraction, squared, and set up on an adding-machine. The sum of squares of differences 
is thus determined, and the cross-product found from the simple identity 

2 z(XY) ^z(X 2 ) + Z(Y 2 ) -z(x - Y)\ 

We then move the right-hand slip down one space so that the jth term is opposite the 
(j + 2)th term on the left and repeat the process ; and so on to as many terms as may 
be required. 

In this process Z (X 2 ) and 2 (7 2 ) are required at each stage, and it is as well to deter¬ 
mine them by cumulative summation from the two ends of the series. Z (X) and Z (Y) 
are also required. It is also convenient on occasion to reduce the series to zero mean 
approximately before beginning the analysis. 

Example 30.1 

To illustrate the arithmetic we will take a very trivial example which the reader should 
check for himself. Take the series 

-5, — 6, 2, 4, 7, 3, 1, -5, - 1, 2. 

We set up the following scheme of tabulation for calculating serial correlations up to the 
fifth order :— 


1 

n — k. | 

k . ! 

Z(X) 

(from beginning 

X ( Y) 
(from end 

X (X*) 
(from 

27 (F 2 ) 

! (from end). 

j X (X - Y)<] 

Z{XY). 

i 


of series). 

of series). 

! beginning). 

' , 


10 

i 

! 0 : 

- 2 ' 

- 2 

170 

170 

i o | 

170 

9 

j 1 

- 4 

3 

lGfi 

145 

143 

84 

8 

! 2 

- 3 

9 

165 * 

109 

344 

- 36 

7 

3 

2 ; 

JJ 

140 

105 

445 

- 100 

6 1 

4 

i 

7 

139 

89 

; 380 

- 76 

5 ! 

5 

- 2 

0 

130 

40 

i 172 

j ! 

- 1 


The number n — 1c is the number of pairs entering into the fctli correlation. Z (X) is the 
sum of n — k terms beginning at the first term, Z (Y) the corresponding sum of the last 
n — k terms, and similarly for Z (X 2 ) and Z (Y 2 ). These are the quantities required to 
calculate the variances entering into the denominator of the kth serial correlation. The 
quantities Z (X — Y) 2 are calculated by the moving-slip method described above. 

We now calculate the correlation coefficients in the usual way, e.g. for r x 

var 7 - ^ - (jjy = 16-000 

0,4816 i A ee 

1 V( 18,247 X 16) ^ 
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cov 


var X = ™ ^ = 25*840 

var Y = ^ — (JV = 8*000 


0-200 


n = - o-oi. 

When n is large and the origin is chosen so that the mean of the whole series is approxi- 

Z(XY) 


mately zero, a sufficiently good value of r is given by 


the corrections 


{Z(X*) 27 (T 2 )}*’ 

required to adjust the sums of squares and products to values about the mean being small; 
but this approximation must be used with some care and in any case the first two or three 
serial coefficients should be worked out exactly. 
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The Correlogram 

30.14. The diagram obtained by graphing r k as ordinate against k as abscissa and 
joining the points each to the next is called a correlogram. We shall give a number of 
examples below and shall see that the form of the correlogram provides a method of dis¬ 
criminating between the various types of oscillatory series. _ 

30.15. Suppose, for example, that the series is generated by a moving average of 
random elements with weights a l9 a 2 , . . . a w . The typical term of the series is then 

Uj = a x e.j + a a c J+1 + . . . i • • • (30.9) 

Without loss of generality we may take E (e) — 0 and hence E (u } ) 

E (uj u J+k ) — E [a x £j a 2 e j+l + . . . + a m } 

{ a i E j+k + a 2 e j+k +1 + • • 

since P , , A . ~ 

E (£j £j+ k ) = 0 , 0 

= v, say, if k — 0 

we have 

E (uj u } + k ) = (a x a k+l f a 2 a k+ * + . . . + a m _ k a m ) v, . . (30.10) 

provided that m > k. But if k > m then 

E (uj u j+k ) = 0 . ..... (30.11) 

Thus for an infinite series generated by the moving average the serial correlations vanish 
for k > m 9 and the correlogram from that point onwards coincides with the #-axis. In 
particular, if the a’s are all equal to 1 /m, we have C v _ i- 

E u jJrk ) = (m - *) 


I' a m € j+k+m-l} • 

7 .<L. C ; Y'l A ■ £ . , 


k 


m* 


/ j ; »,v.v \ s "?i C , ! o 


and hence 


TZK?) r *~ 


< 


( 30 . 12 ) 


so that the correlogram consists of a straight line joining the point (O, 1) to (k, 0), together 
with the g-axis from the latter point onwards. 



THE CORRELOGRAM 


405 


Example 30.2 

The weights of the Spencer 21-point formula are 
1) — — 5, — 5, 


— {- 
350 1 


2, 6 , 18, 33, 47, 57, 60, . . 


Apart from the divisor 350, which may be disregarded for present purposes, the sum of 
squares of weights is 17,542. The products (30.10) and the corresponding serial correlations 
are as follows :— 


k . 

! 

£ a,j a/+*. 

i 

r k. 

*- 

S Oj a/.f k- 

/**. 

0 

: 

17,542 

1000 

11 

- 930 

- 0-053 

1 

16,780 i 

0-957 

12 

- 528 

- 0*030 

2 

14,667 * 

0*830 

13 

- 214 

- 0-012 

3 

11,584 i 

0-060 

14 

- 27 

- 0-002 

4 

8,085 i 

0-401 

15 

! 50 

1 0*003 

5 

1 4,720 1 

0-209 

16 

i 59 

0*003 

0 

1,951 

0-111 

17 

40 

0*002 

7 

6 

0-000 

18 

j 19 

0-001 

8 

- 1,074 

- 0-001 

19 

6 

0-000 

9 

- 1,430 i 

- 0*082 

20 

1 

0*000 

10 

- 1,298 

- 0-071 

21 

0 

0-000 



Fig. 30.3.—Correlogram of Series generated by the Spencer 21-point Formula (Example 30.2). 


The correlogram is shown in Fig. 30.3. From Jc = 13 onwards the correlations are very 
small, and from k = 21 onwards they vanish completely. 
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J30.16. Suppose now that the series consists of a sine term A sin 6t plus e t , a random 
residual. As before, we may suppose E (u t ) = 0, and hence 


E (uj u J+k ) =E {A sin Qj + e,} {A sin 0 (j + k) + c J+fc } 

= A 2 E {sin Oj sin 0 (j + k) } 

A 2 

= — Y 1 {sin Oj sin0 (j + k) } . . . . (30.13) 

n 


— ~ Z {cos Ok — cos 0 (2 j + k) } 

A7t 



A 2 008 0 (k + n + 1) sm nO 

2 2 n sin 0 

. (30.14) 

Thus for large n we 

have effectively, unless 0 is small, 



A 2 



E (uj uj_ |_ fc ) = -- cos Ok ~ B cos Ok, say. 

A 

. (30.16) 

Similarly we find 

E {u 2 /) = B + var e = C, say. 

. (30.16) 

Hence 



r k = ^ cos Ok/ k > 0. 

U 

. (30.17) 


In short, for an infinite cyclical series the correlogram itself is a harmonic with period 
equal ^ to that of the original harmonic component. 

/ 

"-30.17. When the original series is the sum of several harmonic terms the formula 
for r k will, in general, be the sum of harmonics, not necessarily with the same periods. 
Thus the correlogram will present a sinusoidal fornTwhich will not degenerate to the #-axis 
after some fixed point and will not, in fact, be damped. 


r 

30,18. Consider now the series defined by (30.3), namely 


^<+2 — i bu t + £/.f2* 


This is a difference equation which is easily solved by the usual methods.* 
solution of 


is 

where 


u t+2 + 1 + bu t = 0 . 

t = p f (A cos Ot + B sin Ot) 


p = vf> 


cos 0 = — 


2 y/b 


The general 
. (30.18) 
. (30.19) 

. (30.20) 


Here y/b is to be taken with positive sign, and it is assumed that 46 > a 2 . We also assume 
that y/b is not greater than unity. The contrary case is mathematically permissible, but 
it implies that u t increases without limit, which is outside the domain of our consideration. 


* See, for instance, Milne-Thomson, Calculus of Finite Differences f chapter 13. 
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Consider now the series 

.(30-21) 

where £ ( is a particular solution of (30.19) such that £ 0 '“ 0 and <?i = l, ie. such that 

• • ,30 - 22) 

On substituting (30.21) in the original equation it will be found to provide a particular 
solution. The general solution is then 

Y. 

u t ~ p* (^4 cos Ot + B sin Ot) + ^ e t _j +1 . . . . (30.23) 

;-o 

As p is not greater than unity we shall, in general, find that the first term in this expression 
is damped out of existence. If we may regard our scries as having been “ started up ” 
some time prior to the point t — 0 , the* solution is effectively 

.(30-24) 


30.19. In this form the autoregressive scheme is seen to be a moving average of 
a component e with infinite extent and damped harmonic weights. Consider now its 
correlogram. We have 


Now 


Thus 


JT f/+* « lb~ a 2 E ' k sin % sin 0 0 “ + k ) } 7 

j=0 a 

— ^IL Z [p ij { cos Ok — cos 0 (2 j + k) } ] 

46 — a 2 

2p k f cos Ok _ cos Ok — /> 2 cos 0 (k — 2) | ^ 

46 — a 2 \ 1 — p 2 1 — 2p 2 cos 20 + ^p 4 J 

E (Uj Uj+ k ) ^ E {£ (£j £/_j+1) (S; } 

5 i 

QO 

= var e z (ijfj+i).(30.26) 

J~» 


ao 

var e 

_ i-o .. 

vareJT 


r k = 
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which, on 

substitution from (30.25), reduces to 



r “ ~ <1 + p‘, sin 8<" n <* + *> 9 - p ‘<*-»•}• 

. (30.27) 

Writing 

1 4- 

tan y) = y ' - g tan 0, .... 

. (3j>.28) 

we find 

p*sin(k0 + y,) k>Q 

sin y> y 

. (30.2^ 


From this we see that the eorrelogram will oscillate with period 2\n/0 h but that, owing to 
the factor p k , it will he damped. If k is negative the formula applies, except tlialT | k | 
must be used instead of k on the right-hand side of (30.29). 

30.20. We thus reach the interesting conclusion that the three types of series con¬ 
sidered in 30.9, however similar to the eye, will have distinct types of eorrelogram, pro¬ 
vided that the series are long enough for the observed correlations to approach the expected 
values for an infinite series. The eorrelogram of a series generated by moving averages, > 
though it may oscillate as in Example 30.2, will vanish after a certain point; that of a 
series of harmonic terms will oscillate, but will not vanish or be damped ; that of the auto¬ 
regressive scheme will oscillate and will not vanish, but it will be damped. The eorrelogram 
therefore offers a theoretical basis for discriminating between the three types of oscillatory 
series. 

30.21. Unfortunately the series with which we have to work are very frequently 
too short to enable a decisive distinction to be made. We shall see below that divergence 
between theory and observation can be very considerable, and that sampling theory has 
not yet advanced far enough to enable us to make objective judgments in probability 
about its significance. We shall have to rely on limited experimental evidence and to 
some extent on intuitive judgment in reaching conclusions. If, therefore, the remainder 
of this chapter contains gaps in the treatment and leaves certain points undecided the 
reader will understand that the reason is ignorance rather than indifference. 

I Examples of Correlograms from Observed Series 

30.22. We will in the first place give the correlograms of a few of the series given 
earlier in this and the preceding chapter. 

Example 30.3 

In Table 30.2 we gave the deviations from the trend of marriage rates for the years 
1843-1896. The first 20 serial correlations of this series are shown in Table 30.6 and the 
eorrelogram in Fig. 30.4. 
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TABLE 30.5 

Serial Correlations of the. Marriage Data of Table 30.2. 


Order of 


Order of 


Correlation 

r k' 

Correlation 

r k - 

k . 


k. 


1 

0-503 

11 

0-080 j 

2 

- 0 089 

12 

- 0-130 ! 

3 

- 0-498 

13 

- 0-132 j 

4 

0-031 

14 

- 0-058 ! 

5 

- 0-407 

15 

- 0-095 j 

0 

-- 0-025 

10 

— 0-120 j 

7 

0-353 

17 

- 0-030 

8 

0-390 

18 

0-131 

9 

0-254 

19 

0-209 

10 

0-104 

20 

0-205 


VO 



-io | 

Fig. 30.4. —Correlogram of Marriage Data of Table 30.2 (Table 30.5.). 


The correlogram is smooth and suggests the operation of an autoregressive scheme. 
There is little indication that a moving average, at least of extent less than 20, would account 
for the series, but on the other hand some damping appears to be present. 

Example 30.4 

Table 30.6 shows the first 60 serial correlations of the Beveridge series of Table 30.1, 
the correlogram being given in Fig. 30.5. ' 
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TABLE 30.6 


Serial Correlations of the Beveridge Wheat-Price Index of Table 30.1. 


Order of 
Correlation 

r k . 

k. 

r k - 

k . 

r k . 

k. 

r k . 

k. 

1 

0-562 

16 

0-158 

31 

0-060 

46 

- 0-036 

2 

0103 

17 

0-109 

32 

- 0-008 

47 

- 0-013 

3 

- 0 075 

18 

0-002 

33 

- 0-039 

48 

0-042 

4 

- 0 092 

19 

- 0-075 

34 

0-007 

49 

0-062 

5 

- 0 082 

20 

- 0-062 

35 

0-056 

50 

0-065 

6 

- 0136 

21 

- 0-021 

36 

0-010 

51 

0-050 

7 

- 0-211 

22 

- 0-062 

37 

- 0-004 

52 

0-009 

8 

! - 0-261 

23 

- 0-088 

38 

- 0-015 

53 

- 0-027 

9 

| - 0-192 

24 

- 0-084 

39 

- 0-047 

54 

- 0-053 

10 

- 0-070 

25 

- 0-076 

40 

- 0-047 

55 

- 0-073 

11 

- 0-003 

26 

- 0-091 

41 

0-008 

56 

- 0-106 

12 

- 0-015 

27 

- 0-052 

42 

0-034 

57 

- 0-084 

13 

- 0-012 

28 

- 0-032 

43 

0-065 

58 

- 0-019 

14 

0-047 

29 

- 0-012 

44 

0-099 

59 

0-003 

15 

0-101 

30 

0-059 

45 

0-009 

60 

0-010 



The correlogram here is almost certainly damped. The oscillations persist in a most 
remarkable way, notwithstanding the diminishing amplitude, and the presumption is 
a strong one that the series is of the damped type. 
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Example 30.5 

In Table 29.8 (page 386) we gave the residuals of a sheep-population series for the 
years 1871 to 1935. Table 30.7 shows the first 30 serial correlations of this series and 
Fig. 30.6 the correlogram. Again the correlogram is oscillatory, but the damping is not 
so clear. 


TABLE 30.7 

Serial Correlations of the Sheep Data of Table 29.8. 




- - 


■ 


Order of 






Correlation 

k. 

r k . 

k. 

n• 

k. 

r k - 

1 

0-595 

11 

- 0-142 

21 

- 0-381 

2 

- 0-151 

12 

- 0-172 

22 

- 0-118 

3 

- 0-601 

13 

- 0-186 

23 

0-173 

4 

- 0-537 

14 

- 0*128 

24 

0-343 

5 

— 0-138 

15 

0-052 

25 

0-352 

6 

0-144 

16 

0-276 

26 

0-154 

7 

0-203 

17 

0-439 

27 

- 0-203 

8 

0-118 

18 

0-293 

28 

- 0-456 

9 

0-006 

19 

- 0-074 

29 

- 0-415 

10 

- 0-078 

20 

- 0-359 

30 

- 0-184 

. 


___ 

_ 

_ _ 




—I-Of- 

Fig. 30.6.—Correlogram of the Sheep Population Data of Table 29.8 (Table 30.7.) 
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Significance of a Correlogram 

30.23. The foregoing examples illustrate one of the main difficulties we have to face 
in correlogram analysis. On intuitive grounds we seem to be justified in rejecting the 
scheme of moving averages as a possible scheme for the series of these examples, since the 
oscillations in the corrclograms persist; but we can no doubt find moving averages which 
will produce such correlograms, though their extents would have to be long (over 60 in 
the case of the Beveridge series) and their weights artificial. The only final test seems to 
be to ascertain such a moving average and then to examine whether it will predict further 
terms in the series if such can be observed. 


30.24. Distinction between the scheme of harmonic components and the auto¬ 
regressive scheme is even more difficult for short series, since the correlograms for the 
latter do not damp out according to expectation. Consider in fact an autoregressive 
scheme of the simple linear type (30.3). There will be the usual variation in length from 
peak to peak and in amplitude ; but if the section of the series is a comparatively short 
one, covering, say, four or five oscillations, the oscillations will not have time to get very 
much out of step and the serial correlations will be systematically larger than one would 
expect for an infinite series. This effect is exhibited in Table 30.8 and Fig. 30.7, which 
give the serial correlations and the correlogram for the series of Table 30.4, given by the 
formula 

u t+ 2 == i*l u t+\ — 0*5 u l + c /-| 2* 


Here the damping factor p — y/b — 0-7071, and by the thirtieth correlation r k should be 
very small, less than 0-002 in absolute magnitude. Actually it is 100 times as large. The 
mere fact that an observed correlogram for a short series fails to damp very rapidly is 
not, therefore, a very definite indication that the series is not ruled by the autoregressive 
scheme. On the contrary, failure to damp may be expected. 

v30.2SC We are on firmer ground when considering the significance of a correlogram 
in of judging whether it can be derived from a random series. 

(a) The variance of r k in a random series of n terms is approximately —— provided 

n —• ic 

that n is large. For 

f 1 1 2 1 

X E | —~— k JT (Xj x J+k ) j = ( ~ E {Z x 'j x hk + 2Z Xj x J+k x m x m+k }, j^m 


Hence, for large samples, 


-- var 2 x . 

n — k 


var r = 


1 var 2 a: 
n — k var 2 x 



. (30.30) 


R. L. Anderson (1942) has recently given exact results for the significance of a serial 
correlation. 

(6) For our purposes, however, the important point is not whether a particular serial 
coefficient is significant, but whether the oscillatory character of the correlogram as a whole 
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SIGNIFICANCE OF A CORRELOGRAM 
TABLE 30.8 

Serial Correlations of the Artificial Series of Table 30.4. 


! Order of 

1 Correlation 

r k . 


k. 

' k. 

j _ 




1 

0*70 


11 

! 2 

0-29 


12 

3 

001 


13 

! 4 

- 017 


14 

5 

- 0-27 


15 

i 6 

- 0*25 


10 

7 

- 013 


17 

i 8 

007 


18 

9 

012 


19 

! io 

i 

005 


20 


r k - 

k. 


- 0 05 

21 

005 

- 017 

22 

- 0-12 

- 0-27 

23 

- 0-28 

- 0-31 

24 

- 0-43 

- 0-30 

25 

- 0-57 

- 0-18 

20 

- 0-50 

012 

27 

- 0*20 

0-29 

28 

j 002 

0-33 

29 

i 017 

0*22 

30 

! 0-27 

i 



is so. Here we have to form an intuitive judgment, but it can hardly be doubted that 
the undulations in Figs. 30.4 to 30.6 are not accidental. Something exists to be explained 
as a systematic effect, though what that effect is may be more difficult to decide. 

30.26. We shall proceed to study the autoregressive scheme and the scheme of 
cyclical components in more detail, without prejudice for the time being to the question 
as to which is the better representation in particular cases. This latter is not, in fact, 
entirely a statistical matter, and we shall return to it in 30.39. 
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The Autoregressive Scheme 

30.27. We consider in the first instance the simplified scheme of equation (30.3). 
The theoretical correlogram for a series generated by this equation is of the damped type 
given by (30.29), 

_ p k sin (k6 + tp) 

Tk sin tp 

where 2ti/0 is the autoregressive period of the regression equation and is given by 


cos 0 


a 

2 Vfc’ 


The typical series of this kind has no “ period ” in the strict sense. The lengths from 
peak to peak or from upcross to upcross vary in the characteristic way. It appears from 
experiment (but has not, I think, been shown theoretically) that the distribution of dis¬ 
tances from peak to peak is of the unimodal type with a central value somewhere near 
the mean distance between peaks ; and similarly for troughs and upcrosses. In speaking 
of the “ period ” of an autoregressive series we mean the central value of one of these 
distributions. The question we have now to consider is whether this period is the same 
as the autoregressive period 2 n/0 of the regression equation. 


30.28. We have seen in 29.26 that the mean distance between upcrosses of the 
series generated by the moving average whose weights are fi . . . is given by 2n/$, 
say, where 

in— 1 

Y.Sfti+i 

cos <f> = J —- 

“ m 

i] 

Substituting for £ from (30.22) and using (30.25), we find 



2 p f cos 0 cos 0(1— p 2 ) \ 

cos <f> = 46 — l 1 — P t 1 — cos 20 + p* J 

2 f _1 _“ 1 —jp* cos 20 T 

4b — a 2 \ 1 — p 2 1 — 2 p 2 cos 20 + p* J 

_ 2 p cos 0 

~ T Tp* 

_ a 

~ !+:». 


(30.31) 


Thus the mean period as defined by upcrosses is 

2:rc/arc cos ^ ^ 

whereas that for the autoregressive period of the equation is 

2?I/arcco8 (^l)* • 


(30.32) 


. (30.33) 
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30.29. The mean period between upcrosses is thus not the same as the autoregressive 
period. The two are very close for many of the values of a and 6 arising in practice. For 
instance, when 6 = 1 they are identical; when a = 1, 6 = 0*5 their ratio is 1-07i One 
might infer that an estimate of the period of an autoregressive scheme can be obtained 
from the correlogram, but this generalisation requires some important qualifications. 

(a) Firstly, the ratio of (30.33) to (30.32) is not necessarily close to unity for values 
of 6 in the neighbourhood of a 2 /4, i.e. when 6 is small and the autoregressive period is long. 
Consider, for instance, the series generated by 


We have 


^<+2 "• 1 — 0 * 4 iq + 


cos 0 


a 

2 y/b 


l-2__ 

2-y/0*4 


= 0-9499 


However, for <f>, 


0 — 18*2°, period = 19*7 units. 

COS <£ = - 1 ! 2 -= 0-8571 
1-4 


4 > — 31°, period = 11-0 units. 

The mean distance between upcrosses, and a fortiori that between peaks, is very much 
shorter than the autoregressive period. 

(6) The mean distance between upcrosses may miss certain oscillations above or 
below the #-axis, so that it overestimates the period between peaks or troughs. On the 
other hand, the latter may include ripples on the main wave which we wish to ignore. 
The reader can verify for himself, by constructing an autoregressive series by some such 
formula as the above, how difficult it is to draw the line in particular cases. The difficulty, 
however, must be faced, for it is precisely the kind which we meet in dealing with observed 
series. 

(c) Owing to the appearance of the phase angle y in equation (30.29) the starting- 
point of the correlogram (k -- 0) is not to be regarded as a maximum. The period of the 
correlogram is therefore to be calculated either by ignoring this point or by reference to 
distances between troughs and upcrosses in the correlogram. 


30.30. The equation 

U't+2 4- au t+i ~t~ ^ u t ” e t+ 2 

may be regarded as expressing the regression of u t+2 on u t+l and u t , the term e t+2 being 
a residual error. We may therefore estimate the constants a and 6 from the regression 
equation of the observed series in the usual way. If we assume that the series is long enough 
for end effects to be negligible in determining the variances of the finite series, then 
var u t+2 = var u t+1 = var u t , and from the usual formulae for regressions we find 


a = 


Ml *7 r 2) 
1 - rf 


. (30.34) 


6 



. (30.35) 


This gives us the constants of the autoregressive scheme from the serial correlations. 

It should, however, be realised that these estimates are rather sensitive to superposed 
error of the type we refer to below (30.32), and it is therefore unsafe to estimate the 
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autoregressive period from them. The correlogram itself appears to be a safer guide on 
this matter. 


Example. 30.6 

Consider again the sheep data of Table 30.7 and Fig. 30.6. Suppose we have decided, 
from the appearance of the correlogram, to attempt to represent the series by an auto¬ 
regressive scheme. 

In the first place, we have to inquire whether a scheme of the simple linear form (30.3) 
is likely to be adequate. Would it, for example, be better to consider the more general form 

u t+ 3 “t" au t+ 2 + 1 CU t = Ef-f 3» 

or need we take into account curvilinear regressions such as 

^+2 "t - ~l~ & U f+l + b u t + b' wf + e t + 2 ? 

The first point can be elucidated by the use of partial and multiple correlations. The 
following are the partial coefficients and the function of the multiple correlation 1 — B 2 
as determined by the continued product of (1 — r 2 ) (cf. vol. I, equation 15.45, 
p. 380) :— 


Order of Partial 

Valuo of Partial 

II (1 - r J ). 

Correlation. 

Correlation. 

12 

0*595 

0-6460 

13.2 

- 0-782 

0-2509 

14.23 

0-097 

0-2485 

15.234 

- 0-183 

0-2402 

16.2345 ; 

0-031 

0-2400 

17.23456 | 

i 

0-014 

0-2400 


Evidently no appreciable gain in representation is to be obtained by taking the regression 
on more than the two preceding terms. 

The possibility as to better representation by taking curvilinear regressions may be 
considered by drawing the scatter diagrams of u t on u t+l and u t on u t+2 . These are 
shown in Fig. 30.8. It seems clear that there is an essential scatter in the data which no 
ordinary polynomial can represent, and that curvilinear terms are unlikely to add anything 
material to the linear regressions. 

We conclude that if the data are of the autoregressive type it is unnecessary to con¬ 
sider any more elaborate scheme than the simple type 

2 + Q'V'i+i + blit = £(. |_ 2 * 

For this series we have 


r x = 0-595, r 2 = — 0-151. 



1-060 


- 0-782. 


Hence 



Values x 

ofu 

t+i 

100 ' * 


x xx 


x t 

x X 


VdlUC W| U, X 


A.S.—VOL. II 


Fig. 30.8._Scatter Diagrams of ut on ut +1 (top figure), and ut on 2 (bottom figure). 

ITftT. tt 417 E 
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The autoregression equation is 

— 1*060 — 0*782 Uf -j- €g_j_2* 

For the autoregressive period we have 


cos 0 


1*060 

2V(0-782) 


0*600, 


0 = 63*2° 


360 

and hence the period is — 6*8 years. 


Now in the correlogram (Fig. 30.6) there are peaks at k ~ 7, 17 and 25, giving a period 
of about 9 years ; and there are troughs at k — 3, 13, 21 and 28, giving a mean period 
of 8*3 years. The autoregressive period as estimated from the correlogram is then between 
8 and 9 years, whereas that given by the autoregression equation is 6*8 years, considerably 
shorter. 

Using the values of a and b found above, we have for the mean distance between 
upcrosses, 


cos 


1*060 

1*782 


0*5948, <f> - 53*5°, 


giving a mean distance practically equal to the autoregressive period as shown by the 
regression equation. 

Finally, looking to the original series, we see that there are nine major peaks, the 

58 

first in 1874 and the last in 1932, so that the mean distance between peaks is - — 7*25 

8 


years ; and nine upcrosses, the first between 1872 and 1873 and the last between 1930 and 

58 

1931, so that the mean distance between upcrosses is -- = 7*25 years, the same as for peaks. 

8 


The upeross at 1876-7, however, is due to a temporary fall below the zero line, and had it 
not occurred we should have found a mean distance of 8*3 years. 

We have therefore reached this position : the mean period in the series itself appears 
to be about 7*25 years ; that given by the regression constants is 6-8 years ; and that given 
by the correlogram is about 8*5 years. These figures are scarcely close enough for comfort, 
and further data would be required to arrive at a more accurate estimate of the mean 
period. Nevertheless, they illustrate very well the kind of divergence which appears to 
be more the rule than the exception in dealing with short series. We should expect the 
correlogram to give a higher value than the series itself, for there may appear peaks or 
upcrosses in the latter which are purely temporary fluctuations due to the casual element. 
On the other hand, the regression constants appear to give consistently lower values for 
the autoregressive period than the correlogram, an effect found by Yule (1927a) for sunspots, 
Wold (1938a) for cost-of-living indices, and Kendall (1944a) in series of agricultural prices, 
acreage and livestock populations. 


30.31. Let us examine more closely the effect referred to at the end of the previous 
example. Our autoregressive system is based on a random element e t which is added to 
the term w /+2 . We can therefore regard the value at time t f 2 as composed of two parts, 
a systematic element expressed by au i+l + bu t , giving the effect of the past history of the 
system at times t ( l and t, together with a new random element peculiar to the moment. 
This latter is random in the sense that it is casual and unpredictable ; but once it has 
occurred it is incorporated into the motion of the system and exerts an influence on future 
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history. It is therefore quite unlike an error of observation or a sampling error which 
distorts the value of a particular member but does not affect the others. 

Now suppose that such an error of observation is present, and let us represent it by 
V- ^ or l° n g series this element will increase the variance of the observed values by var rj t 
but if it is independent of the remaining constituents of the series it will not affect the 
covariances. Hence the serial correlations will all be reduced in a constant proportion c, 
except of course r 0 ; and this, as we proceed to show, will affect the autoregressive period 
as derived from the regression constants, in general shortening the period quite considerably. 


y 


30.32. If r, is reduced to cr l and r 2 to rr a , the constants of the regression equations 
are, from (30.34) and (30.35), 

cr x (1 — cr t ) 

1 — c 2 r\ 


— a 


-- b' 


cr 2 — c rf 


1 — c 2 r\ 

The estimated autoregressive period is then O', given by 

cos O' -- tl - 

2 W h 

cr i (1 ~ rrj 

' 2v/(l - c* r\) (r 2 - cr 2 )' 

Differentiating the logarithm of this expression and putting c — 1, we find 

n ,d0' } 2r 2 , 2rf 


(30.36) 

(30.37) 


2 tan O' 


dc 


1 


1 - - rf 


I- - 


rf 


which reduces to 


r 2 — ri 


tan 0 


,d0' ( l }- b) (36 s *| b — a 2 ) 


a 2 } 


Now tan 0 


dc 2/> {(1+6) 2 

“ 1 ^ and the period P — 2n/Q. We then find 
(dP y 


(30.38) 


V dc 4?r6 { (1 -!■ 6) 2 


(30.39) 


P 2 a (1 + 6) (36 2 -f b - a 2 ) 

{ (l 4- 6) 2 - fi 2 jV(46 - a 2 )’ 

This equation gives us an approximate idea of the change in the period P for small 
changes in c near c — 1. For instance, with a ~ — 1*5, b — 0*9 we find P “ 9*7 units, 
and from (30.39), 

(“Hr — 

Thus, if c — 0-9, i.e. the variance of // is about 10 per cent, of the total, the period will be 
reduced by about 1*65 years, a substantial amount. 


30.33. It is thus possible that the observed discrepancies between the autoregressive 
periods as given by the regression constants and the eorrelogram may be due to superposed 
random fluctuation which is not incorporated into the autoregressive scheme. This is 
not the only possible explanation ; for instance, in particular cases the disturbance function 
e may not be random. The hypotheses to be considered in such a case, however, are so 
complex that it is difficult to pursue a quantitative investigation without a wealth of 
material; and this, unfortunately, is usually denied to us, at least in economic work. 
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Meteorological data are more numerous, and we may hope that further light will be thrown 
on the autoregressive scheme by a re-examination of the material available in this field. 

30.34. Consider now the more extended autoregression equation 

u t+m + a i u t+m-l + u t+m _ 2 + . . . -f a m u t = . . (30.40) 

The explicit solution cannot be given in the simple form available when m = 2. It has, 
in general, the solution 

• u t — A l <x.{ + A 2 a| + . • . + A m -f B, . . . (30.41) 

where a x . . . oc m are the roots of 

(x. m + a w ~ l + a 2 a w “* + . . . 4 a m = 0, . . . (30.42) 

and S is a particular integral involving the e’s. For the series to be oscillatory without 
increasing indefinitely no term such as x*, where x is real and greater than unity, can appear. 
Assuming this to be so, and assuming further that the series was “ started up ” some time 
before t = 0, we reduce the solution to the particular integral B. 

m 

Choose a particular value f, of ^ A j aj, such that 

J-l 

fo = 0 ' 

<?1 + a i fo ^ 1 

ft + a t f 1 + a 2 lo = o y . . . (30.43) 

fm-l + a l fm-2 + • • ■ + a m -1 f 0 = 0. , 

This is always possible in general, for it imposes m conditions on the rn constants A. Then 
it will be found on substitution that a particular integral B is given by 

O0 

e t -j+ 1..(30.44) 

i-u 

a generalisation of (30.24). Our series may then be regarded as generated by a moving 
average of infinite extent, the weights being combinations of damped harmonic and 
exponential terms. 

30.35. The correlogram of such a series may be determined by the following method, 
due to Walker (1931). Multiply (30.40) by u t „ k and sum. We find 

r k +m + «1 r k+m—i + a. r k+m -2 + . . . + a m r k = • (30-45) 

Now u t _ k depends only on e t ^ k and terms with lower subscripts and hence is uncorrelated 
with e t+m for k > — m. Thus we have 

r k +m + r k+rn -1 + - - - + a m r k = 0, k> -m. . . (30.46) 

If we multiply (30.40) by u t + k+m we find similarly 

r k + a i r k+l + • • • + a m r k+m = —“ • * (30.47) 

var u 

but the expression on the right no longer vanishes. In fact u t+k+m contains the term 
f*+i e i+m> » nd hence 

... t var e 

r k -f- ai r t+1 + . . . -+- a m r k+m = 


k > — to. 


. (30.48) 
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From (30.46) it follows that the serial correlation r k will be given by 

r k ~Z(Aj a*),.(30.49) 

where the a’s are the roots of (30.42) and the A’s are constants to be determined from initial 
conditions. Thus the correlogram will be the sum of terms which either decay exponentially 
to zero (areal) or oscillate with a similar decay to zero (a complex). Walker (1931) has 
used this result in an inquiry into a series of atmospheric pressures. 

The Autocorrelation Function 

30.36. If we have a series u (t) defined at every point of time in some range — h 
to + A, we may define its variance as 

> P u 2 (t)dt .(30.50) 

2" J -h 

on the assumption that the mean value is zero, which does not limit our generality. Sup¬ 
pose the series is reduced to standard measure by dividing throughout by the square root 
of this variance. Then an evident generalisation of the serial correlation is given by 

•/• (jfc) y f* u (t) u (t I- frt dt .(30.51) 

-h J h 

We shall call this tin 1 autocorrelation function. We (an likewise regard it as defined when 
h tends to infinity, provided that the limit on the right in (30.51) exists. It is to be noted 
that r (k) is in that ease an even function of k. 


30.37. We shall also consider the function 


(k) ~ | n (t) u (t |- k) dt , 


when it exists. We have 


-30 /•/> 

I 11 (k) e ik i> dk = I 1 u ( t) u (t +- k) dt dk 

J —00 J —00 J —OO 

^ f f p ip «+fe) u (t 4- k) e 'W u ( t) dt dk. 
J -n J —m 

The simple substitution t }- k — q reduces this to 

f e ipq u (r/) dq f e~ ipt u (t) dt. 

J —Q0 * J —oo 


Thus, if we write 


we have 


a ( p) + ( p ) — f <? m « (?) dq, 

J —00 

f* It (k) e ikp dk = a 2 (p) + p 2 ( p). 

J — X 


(30.52) 


(30.53) 


(30.54) 


It follows, as is otherwise evident from the fact that R (k) is an even function, that the 
imaginary part on the left of (30.54) vanishes, and we have 


f R (k) cos kp dk = « 2 (p) + p 2 (p). 

J — <X> 


. (30.55) 



422 


TIME-SERIES 


If, following the notation of characteristic functions, we write <f> R (p) for the integral on 
the left in (30.54) and <f> u (p) for that on the right in (30.53), we have 

+R(P) = \<t>u_(P)\* .( 30 - 56 ) 

We may then put <f> n (p) — V <f>R e ''\ ..... (30.57) 

where p is an arbitrary real function. We shall then have 


n (t) 


2^J <j> u (p)e~ i,p dp 

i r 

2^1 V<f>n exp (ip - itp) dp. 


. (30.58) 


Since u (t) must be real, the imaginary part vanishes and this is equivalent to 

U ® = 008 ^ “ tp ^ dp ’ • (30.59) 

and fi must be an odd function of p, The result is due to Wiener (1930). It shows that 
the autocorrelation function ti does not uniquely determine u ( t ) because of the arbitrary 
function /i. 


30.38. Consider now the autocorrelation function r (k) as defined in (30.51). Let 
us regard the series as defined but equal to zero outside the range — h to | h . 
Then we have 

2 h r (k) — f ii (t) u (t + k) dt — f u (t) u ( t -f k) dt — R ( k ), . (30.60) 

J -h J oo 


where R and r are zero outside the range — 2h to -\- 2h. The foregoing results then con¬ 
tinue to hold with some modifications concerning factors in 2. If we write— 


and 


& (p) = J ^ r (k) e ik » dk = R (k) e ikp dk 

(P) = « (0 e"" dt = * J u (t) e iip dt, 


then corresponding to (30.56) we have 


2 $ r (P) = I 4 (P) I 2 - 


. (30.61) 
. (30.62) 


. (30.63) 


We may now let h tend to infinity and observe that the results continue to hold under 
certain general conditions, provided that the limits exist. 


Example 30,7 

Consider the series 

« (0 — sin (Ai t + a t ) + A t sin (A t t + a 2 ) A m sin (X m t + a m ). 

For the variance we have 

1 C h 1 C h vN 

lim 2/iJ , U * dt ** Um 2h J h £j {^’ sin *(V + *}) }dt, 

since the cross-product terms wdll contribute only a finite amount to the integral and hence 
vanish in the limit, 
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= lim k f* A cos 2 ( ; -i * + «/) I ] dt 




Similarly for m (<) w (< + k) we have 

lim 2A f_ /( L “ sin {X i 1 + 
= lim k f „ * 21 MJ l cos A i * 


) I J i A sin (A^ t kj k t- *^) } ] rfi 
cos {/^ (2< j- k) + 2a.j } ] } dt 


— i £ A j cos A, A. 


Thus r (A) = 2 Mi.«? (*/M 

' 7 r> 4‘> 


^5 


The correiogram is the sum of a series of harmonics, like the original series, but the 
coefficients are different and the harmonics an 1 all in phase. 


30.39. The idea underlying the autoregressive scheme of representing time-series 
may perhaps be best illustrated by an analogy. Imagine a motor-ear proceeding along 
a horizontal road with an irregular surface. The car is fitted with springs which permit 
it to oscillate to some extent but are designed to damp out the oscillations as soon as the 
comfort of the passengers will permit. If the car strikes a bump or a pothole in the road 
the body will oscillate up and down for a time but will soon come to rest so far as vertical 
motion is concerned. If, however, it proceeds over a continual succession of bumps there; 
will be continual oscillation of varying amplitude and distance between peaks. The oscilla¬ 
tions are continually renewed by disturbances, though the distribution of the latter along 
the road may be; quite random. The regularity of the motion is determined by the internal 
structure of the car : but the exifitenev of the motion is determined by external impulses. 

> 30.40. It appears to me very plausible to suppose that oscillations in time-series 

are generated in this way. One does not have to postulate some external rhythmic influence 
which keeps the oscillation going, or to su ppose that the system will oscillate withou t 
da mping once it has been s et i n motion. Nor is it necessary to assume that the majority 
of the deviations between theory and observation are due to “errors” which exert no 
effect on the subsequent movement of the system. The reader, however, will have to 
form his own opinion on this matter.* We now proceed to examine an alternative scheme 
of representation in which the series is represented as a sum of (undamped) cyclic terms. 


Periodogram A nalysi a* 

30.41. It is well known that undcr certain gene ral conditions a function f (t) can be 

expand ed in the Fourie r se ries, valid in a certain r ange, 

,, . , nt . 2nt , Snt , 

/ ( t) = a 0 + cos - + a 2 cos — { 3 cos-b . . . 

Ax Ax Ax 

+ b 0 + bx sin + b 2 sin - + sin — -f-.(80.64) 

Ax Ax Ax 

* The scheme* considered in this chapter may over-simplify natural conditions in that it assumes 
finite random disturbances at equidistant, time-intervals. If the intervals are not equal, or if the dis¬ 
turbances are small and continually occurring, the autoregressive scheme is only an approximation. 
Much remains to bo done on this subject. 
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Functions which are not periodic can be expanded in this way; for instance, in the 
range 0 < x < n, 

x . 1 . , 1 . n 1 . . , 

- = sin x — - sin 2x -f - sin 3a; — - sin 4x + • • • 

2 2 3 4 

The function of course, repeats itself in the range n < x < 2jr, and so on. 

As a representation of observed series the Fourier series is rather restricted in scope, 
since the period of every term is a multiple of the fundamental period 2A X . A more general 
scheme is provided by the series 

J (0 = a 0 + Uj cos --1- a 2 cos - -h • • • 

^ Ai . A 2 


, , . , . 2 nt . , . 2nt 

+ b 0 + sin — + b 2 sm — + . . 

Ai A a 


or the alternative form 


/(/) = A 0 + A x cos 




+ A 2 cos + 


(30.65) 


(30.66) 


Here the A\s are not necessarily commensurable. The object of our analysis is first of all 
to find out what are the best values of the A’s to select, and secondly to evaluate the other 
constants a and ft, or A and a. 

/ 30.42. Suppose we wish to test whether a time-series contains a harmonic term with 

period //. Consider the series 


n4-» 


(30.67)* 


„ 2n . 2nj 
B — - > u, sin —: 


(30.68) 


and write 


S 2 = A* + B 2 
= ?2-i%exp 


m 


Suppose that the series is in fact given by 


-* X , 


• 27cj ,(. \ .Vi/ 
u, =asm--^ W/ 


(30.69) 


(30.70) 


where bj is a component which we will assume to contain no cyclical element, so that its 
correlation with the other component is zero, at least for long series. Then we have 




* Some writers define these sums with j from 0 to n — 1. The signs of A and B may then differ 
from those given by (30.67) and (30.63), but the intensity and phase are unaffected. 
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and the second term may be neglected. Thus, writing 


2n 

a ~ T , 


we have 


o 2tc 

P =* —9 

P 


A = — E (sin a j cos ftj) 
n 


{ sin (* —P)j+ s»n (« + (i)j} 

wh 

_ (i f sin i (a-/?) n sin £ (a-/?) (n + 1) , sin \ (<x+0) n sin £ (a+0) (w + 1) 


sin^ (a—/?) 


sin i (a-f-/l) 


(30.71) 


For large n this remains small unless a approaches ft (or - ft, which is essentially the same 
situation), and in that case we have 


Similarly, 
so that 


A ~ a sin i (a - ft) (n } 1). 
B ^ a cos \ (a — ft) (n + 1), 

S* - A 2 f- 72 2 -r. a 2 . . 


(30.72) 


Thus S remains small unless the trial ” period /1 approaches the real period A, and in that 
case equals the amplitude a. 

30.43. Similarly we may expect that if the series consists of a sum of harmonics j 
with periods A x , A 2 , - - • A w , S will be small, unless fi is equal to one of these periods, in 
which case it is finite and equal to the amplitude of the term concerned. 

This result forms the basis of what is known as periodogram analysis. We select 
a number of trial periods for different values of // and calculate B 2 for each of them. S 2 , 
which is called the intensity , is then exhibited as a function of /*, and graphed as ordinate 
against /i as abscissa. The diagram obtained by joining the points, each to the next, is 
called the periodogram. If this figure has peaks at certain values A x . . . A m and we are 
prepared to assume that these are not sampling accidents, the values are the appropriate 
periods of harmonic terms and the intensity B 2 provides the corresponding amplitudes. 
The quantities A and B of (30.07) and (30.68) are obtained incidentally and provide the 
phase angles a of (30.66). We shall illustrate the arithmetic processes below. 


30.44. Fig. 30.9 shows the periodogram of the wheat-price index data of Table 30.1. 
In order not to confuse the diagram for lower values of the trial period we have shown 
only the major fluctuations. The length of the series was about 300 years from 1545 to 
1844, earlier and later figures shown in Table 30.1 not having been taken into account. 
The primary data have been taken from Sir William Beveridge’s classical paper (1922) and 
are shown in Table 30.9. For practical reasons which will emerge presently, certain trial 
periods are taken not over exactly 300 years but over the number N of years shown in 
the table. To reduce the figures to comparability, Beveridge therefore multiplied the 
N 

sum A 2 + B 2 by 
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TABLE 30.9 

Periodogram Analysis of the Beveridge Wheat-Price Index Data of Table 30.1. 

(From J.B.S.S., 1922, 85, 412.) 

Tlio first observation relates to 1545, except where A and B are given in heavy typo. 


„ . , Number 
Period r i 

, i of i earsj 
(Years).; N 


A. 


13. 


Intensity 

xV (.4 2 V/I 3 ) 

300 


Period 

(Years).! 


j Number 
! of Years) 


N. 


B. 


Intensity 
N(A 2 +£ a ) 
300 


! 


2 000 I 

300 ! 

1 0-11 ■ 


0-01 

2049 

330 

- 0 40| 

0 09 

0-19 

2 054 

304 ! 

4- 0-48 | 

- 0-72 

0-77 

2061 

340 ! 

+ 0 38: 

- 0 57 

0-54 

2069 

300 ! 

■\ 0-25: 

4- 0-63 

0-46 

2 074 

336 

- 0 611 

4- 0 51 

0-71 

2-080 

312 

+ 0-92 

- 0-50 

114 

2-087 : 

288 

0-52 i 

~ O il 

0-27 

2-095 

308 

~ 0-91 i 

+ 0-90 

1 -69 

2-105 

320 1 

4 0-90 i 

4- 0-07 

0-86 

2-112 

288 

+ 0-90: 

| 0-80 

1-38 

2-133 

320 

f 0-89 

1 0-15 

0-84 

2-154 

308 

1 0-48 i 

+ 0-23 : 

0-29 

2-182 i 

288 

4 1 *32 

- 0-59, 

1 -99 

2-200 

308 

- 013: 

- 0-60, 

0 39 

2 222 

320 

- 0-32 

0-62 

0-52 

2-261 

312 

I 0-50 

0-22 

0-31 

2-286 

320 

- 0-38 

- 0-85 

0-93 

2-316 

308 

f 1-39 ! 

- 1-05 

3-11 

2-333 i 

308 

- 0-10 

0-25 

0-08 

2.353 

320 

1 0-90 

1 007 

0-86 

2.364 

312 

- 0-12 

- 0-63 

0-43 

2-370 

320 

4 0-05 j 

- 0-28 

0-08 

2-375 : 

304 

I- 0-29 

0-43 

0-27 

2-381 

300 

- 0-19 

1-22 

1-53 

2-385 i 

310 , 

- 1-00! 

- 0-89 

1-86 

2-391 

330 

- 1 -30 j 

- 0-54 

2-18 

2-395 

309 

0-72 

f 0-60 

0-90 

2-400 

312 

+ 0-34 i 

| 0-68 

0-60 

2-412 

328 

- 0-08 j 

0-65 

0-47 

2-417 , 

348 

t- 0-63 , 

4- 0-57 

0-69 

2-435 

336 

I 044; 

f- 0-01 

0-22 

2-452 

304 , 

... 140 i 

- 0-51 

2 23 

2-462 , 

320 ! 

- 0-25 i 

4 1-49 

2-44 

2-476 

312 

— 0-38 j 

4 0 35 

0-27 

2-483 i 

288 i 

- 0-07 i 

4 0-74 

0-53 

2-500 

320 ' 

0-24 

f 1*19 

1-56 

2-512 

324 

4 0-86 

1 1-0-39 

0-97 

2-516 

312 

4- 0-45 

h 0-24 

0-26 

2-529 1 

301 

0-19 

- 0-31 i 

0-13 

2-545 I 

336 j 

- 1 39 

— 0 81 

2-89 

2-555 j 

322 

4- 0-38 

; f 0-50 i 

0-42 

2-571 j 

306 : 

-f- 1-25 

4- 0-55 

1*01 

2-588 

308 1 

4 0-30 

! 4- 0-43 ; 

0-28 

2-600 ! 

312 

4 102 

; - 0-39 

1-25 

2-615 

306 

- 0-75 

- 0-24 ! 

0-63 

2-625 

294 

- 0-45 

1 1-36| 

2-01 

2-643 

296 

+ 0-95 

1 - 0-«2 j 

1*27 


j 


2-667 

312 

- 0-92! f 1*20! 

2-38 

2-687 

301 

4- l 23 ! - 0 02 

1-52 

2-092 

315 

- 0 04 i 4 0-23 ; 

0-06 

2-706 

322 

-0-27 4 1-33 

1-97 

2-714 

304 

+ 0-83 1 + 1-17 1 

2-10 

2 727 

300 

-8 0-86 4- 1-46 

2-87 

2-733 

287 

+ 205 1 f Ml): 

6-16 

2-735 

279 

f 2-441 4- 123 i 

7-82 

2-737 

312 

+ 2-23 -t- 1-00 i 

6-22 

2-741 

290 

4 2-43 I 0-25 ! 

5-86 

2-750 

308 

|- 0-90 : - 0-84; 

1-55 

2 762 

348 

- 0 57i - 0 04! 

0-37 

2-769 

324 

4 1 -49 4 0-23 1 

2-28 

2-778 ' 

325 

4 1 -20 - 0-92 ! 

2-48 

2-800 

336 

- 1 01 - 0 19 

1-18 

2-818 

310 

4 0-55 + 1-07 

1-49 

2-833 

323 

4 0-78 - 0 10 

0-67 

2-840 

290 

4- 0-41 , 4- 0-42 

0-34 

2-857 

320 

I 0-90 4 0-21 

1-03 

2-875 

322 

| 0-35 4- 0-14 

0-15 

2-888 

312 

-I 1-51 4- 0-20 

2 43 

2-895 

330 

- 0 69 - 1 57 

3-21 

2-909 

320 

4- 0-70 — 1-11 

1-84 

2-933 

308 

- 0 04 + 0-39 

0-16 

2-947 

336 

- 0 93 - 1 19 

- 2-57 

2-960 

296 

0-00 ! 1-15 

1-30 

3-000 

300 

0-29 - 0-39 

0-23 

3-040 

304 

4 0-09 + 0-75 

0-58 

3-077 

320 

4- 0-05' 4- M8 

1-50 

3-111 

336 

4- 0 91 - 0 44 

1-15 

3-143 

308 

\- 2-01 ; 4 0-23 

4-20 

3-167 

304 

I 0-46 - 1-05 

1-33 

3-200 ; 

320 

1 0-43 4 0-95 

1-16 

3-217 

296 

, 4 1*25 4- 0-00 

1-55 

3-250 

312 

• — 1-22 1 — 0-47 

: 1-80 

3-273 

324 

i - 0-55 4-1*18 

1-82 

3-286 

322 

-Oil + 0-99 

1-07 

3-304 

304 

! 4 0-13 4- 0-75 

! 0-59 

3-333 

320 

+ 0-90 t 1-58 

3-54 

3-364 

296 

+ 1*761 4- 0-98 

4-00 

3-375 

324 

4- 0-55 ; 4- 0-92 1-24 

3-385 

308 

4- 0-35: 4- 103 

1-21 

3-400 

323 

4- 112 i 4 2*37 ! 7-41 

3-407 J 

276 

j 4 2-98' 4- 2-81 ! 14-90 

3 412 

348 

+ 1-27 - 3-98; 15-53 

3-417 . 

328 

4- 3-081 - 2-24 

15-84 

3-429 | 

288 

4- 3-11 ! - 1-40 

i 11-16 

3-444 ! 

310 

4- 0-09! - 0-99 

! 103 
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TABLE 30.9— continued. 


1 

Period 

( Years ). 

Number 
of Years 


4 . 


B. 

Intensity 

N ( A a 4- £ a ) 

Period 

( Years ). 

Number 
of Years 


4 . 


B . 

Intensity 

N ( A * + B •) 

N. 


300 

N. 


300 

3-455 

304 

+ 

0-55 

4- 

0-29 

0-39 

4-933 

296 

4- 

1-57 

4 - 

1-58 

4-91 

3-462 

315 

+ 

1-57 

4- 

1-02 

4-87 

5-000 

300 

4- 

1-85 

4- 

1-00 

4-30 

3-500 

308 

4- 

1-20 

— 

0-94 

2-38 

5-067 

304 

— 

0-05 

4' 

3-98 

16-09 

3-524 

296 

+ 

1-41 

— 

1-18 

3-31 

5-091 

336 


073 

4- 

5-55 

35-05 

3-538 

322 

+ 

0-50 

— 

1-45 

2-53 

5-100 

306 

4- 

5-71 

4- 

2-98 

42-34 

3-556 

320 

+ 

0-02 

— 

0-43 

0-20 ! 

5-111 

322 

4- 

5-70 

4- 

0-29 

34-91 

3-571 

325 

4- 

0-80 

— 

0-69 

1-21 

5-125 

328 

4- 

3 97 

4- 

2 90 

26-38 

3-600 

324 

— 

1-03 

4- 

0-82 

1-88 

5-143 

324 

4- 

2-46 

4- 

2-46 

13-09 

3*610 

304 

+ 

1-18 

4- 

1-23 

2-94 

5-200 

312 

4- 

0-02 

4- 

0-30 

0-10 

3-636 

320 

+ 

1-14 

4- 

0-13 

1-39 

5-250 

294 

4- 

1-74 

4- 

1-92 

6-56 

3643 

306 

— 

0-16 

4- 

0-27 

0-10 

5-333 

320 

4- 

0-71 

— 

4-46 

21-72 

3-667 

308 

— 

2-14 

— 

1-07 

5-87 

5-400 

324 

4- 

1-04 

4- 

3-71 

16-06 

3-679 

309 

+ 

0-34 

— 

1-90 

3-83 

5-415 

325 

4- 

4-27 

4- 

1-90 

23-66 

3-692 

288 

4* 

1-28 

— 

0-22 

1-63 

6-429 

304 

4- 

4-72 

_ 

0-28 

22-61 

3-700 

296 

4- 

0-90 

—• 

0-59 

1-18 

5-455 

300 

4- 

1-37 

— 

3-73 

15-76 

3-714 

312 

+ 

1-15 

4- 

1-78 

4-65 

5-500 

308 

— 

1-04 

4- 

1-49 

3-39 

3-727 

287 


0-45 

— 

1-65 

2-72 

5-555 

300 

4- 

2-40 

— 

0-68 

6-23 

3-750 

315 

4- 

0-64 

— 

0-06 

0 44 

5-600 

336 

+ 

046 

4* 

1-21 

1-88 

3-778 

306 

— 

1-17 

— 

0-68 

1-86 

5-667 

306 

+ 

5-31 

— 

1-97 

32-72 

3-800 

304 

4- 

1-60 

4- 

0-80 

3 24 

5-692 

296 

4- 

2-05 

— 

3-91 

19-18 

3-833 

322 

— 

M2 

— 

1-63 

4-17 

5-714 

320 

4 - 

0-35 

— 

2-13 

4-97 

3-857 

324 

4- 

1-63 

4- 

0-45 

3-08 

5-750 

322 

4- 

1-39 

— 

0-33 

2-18 

3-888 

280 

— 

0-15 

4- 

0-66 

0-43 

5-800 

290 

4- 

3-55 

— 

2-75 

19-47 

3-895 

296 

— 

0-66 

4- 

1-00 

1 42 

5-846 

304 

4- 

0-00 

— 

2-29 

5-35 

3-923 

306 

! 4~ 

0-64 

— 

1-61 

306 

5-933 

356 

4- 

437 

4- 

0 91 

23-63 

3-962 

309 : 

— 

0-67 

4- 

1-74 

3-59 

6-000 

300 

— 

3-50 

~ 

0-12 

12-29 

4-000 

300 j 

+ 

1-47 

— 

1-13 

3-64 

6-111 

330 

— 

079 

— 

1 90 

4-66 

4-077 

318 ! 

j + 

0-57 

— 

0-26 

0-41 

6-143 

301 

4- 

0-74 

— 

2-96 

9-32 

4-111 

296 

j 4- 

1-13 

— 

1-70 

413 

6-167 

296 

— 

0-22 

— 

2-94 

8-56 

4-143 

290 | 


0-50 

+ 

0-23 

0-30 

6-200 

310 

! — 

2-02 

— 

3-38 

16-02 

4-167 

525 

! 4- 

1-21 

4 - 

0-32 

j 1-70 

6-250 

325 

; — 

3-23 

_ 

0-11 

11-30 

4-173 

322 

! 4- 

0-66 

— 

1-46 

2-77 

6-286 

308 

1 — 

1-72 

— 

0-59 

3-41 

4-200 

294 

— 

0-99 

— 

0-41 

1-02 

6-333 

304 

_ 

1-52 

+ 

1-29 

402 

4-250 

323 I 

+ 

0-50 

— 

2-73 

8-32 

6-400 

320 

i 4 - 

0-80 

4- 

2-74 

8-71 

4-286 

300 i 

— 

0-65 

4* 

0-79 

1-04 

6-500 s 

312 

i + 

0-69 

— 

0-73 

0-94 

4-333 

312 i 

— 

1-50 

— 

1-30 

4-10 

6-571 

322 

i + 

1-49 

— 

0-77 

i 3-02 

4-353 

296 

— 

2-85 

— 

0-24 

8-05 

6-667 ! 

320 

+ 

0-25 

4- 

0-21 

0-11 

4-364 

288 

— 

2-98 

+ 

0-75 

9-07 

6-727 

296 

+ 

0-08 

— 

0-13 

0-02 

4-375 

315 

— 

2-47 

+ 

0-87 

7-19 

6-750 

324 

— 

0-20 

— 

1-66 

' 3-01 

4-385 

342 

j — 

0-50 

4- 

2 55 

7-72 

6-800 

306 

4- 

0-23 

— 

0-65 

i 0-48 

4-400 

308 

— 

1-38 

4- 

3-27 

12-89 

6-909 

304 

4 - 

0-58 

+ 

2-56 

! 7-00 

4-412 

300 

+ 

0-08 

+ 

3-62 

13-11 

6-933 

312 

4* 

1-68 

4- 

2-01 

I 7-15 

4-417 

318 

4- 

0-87 

+ 

3-85 

16*48 

7-000 

308 

4 - 

3-10 

— 

2-17 

14-74 

4-429 

310 

+ 

1-80 

+ 

2-41 

9-32 

7-143 

300 

+ 

1-83 

— 

1-86 

6-79 

4-444 

320 

+ 

2-15 

! + 

0-83 

1 5-66 

7-200 

324 

i 4- 

0-54 

— 

3-93 

! 16*96 

4-471 

304 

1 4- 

0-91 

+ 

0-79 

i 1-48 

7-333 

308 

! 4- 

1-52 

— 

2-81 

! 10-46 

4-500 

306 

! + 

1-87 

+ 

0-72 

4*09 

7-400 

296 

■ — 

2-33 

— 

2-72 

12-65 

4-571 

320 

j - 

0-21 

4- 

0-04 

; 0-22 

7-417 

356 

+ 

1-50 

j __ 

4-01 

21-72 

4-600 

322 

I — 

0-08 

1 4- 

1-24 

1-65 

7-429 

312 

— 

3-80 

j — 

1-49 

17-28 

4-667 

336 

! + 

0-19 

j + 

0-93 

1-00 

7-500 

315 

+ 

0-17 

i + 

1-50 

2-40 

4-750 

304 

! — 

0-12 

i 4- 

2-28 

5-28 

7-600 

304 

■ — 

2-33 

| — 

1-37 

7-43 

4-800 

288 

+ 

2-44 

: + 

1-08 

1 6-84 

7-667 

322 

— 

1-46 

i _ 

2-61 

9-57 

4-857 

306 

— 

1-06 

; — 

1-30 

! 2-89 

7-750 

310 

! 4- 

1-38 

— 

0-39 

2 13 

4-888 

| 312 

— 

1-80 

; 4 - 

2-11 

8-00 

1 

7-857 

I 330 

i 

050 

4- 

0-28 

0-36 
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Period 

(Years). 

1 

Number 

1 of Years 

t 

i 

A. | 

i 

! 

B. ; 

1 

8-000 

i 312 

! - 3-96 

j 

4 1-34 

8-091 

356 

! 4- 4-32 

- 0 98' 

8-200 

287 

| 4 1*62 

- 0-64 ! 

8-222 

296 

! 4- 0 -19 

- 0-56 . 

8-333 

325 

: 4- 0-21 

4- 0-91 i 

8-500 

323 

i 4 0-17 

4- 3 -19 

8-667 

j 312 

; + 2-5i 

- 1-01 

8-800 

! 308 

; 4- 2-97 

4- 0-83 

9-000 

! 306 

i - 1-51 

- 0-57 


L, 


9-200 ! 

9-333 ! 

9- 500 : 

9-667 ! 

9- 750 • 

9- 818 1 

10- 000 j 

10 - 200 , 

10- 250 

10-400 

10-500 ! 

10- 750 I 

10-800 | 

11 - 000 

11-200 | 

11-500 ! 

11- 667 ; 

12- 000 j 

12-143 

12-333 j 

12-500 
12-667 j 
12-800 ! 

12- 875 j 

13- 000 

13-333 

13-500 

13- 667 

14- 000 

14-500 

14- 667 

15- 000 

15-200 

15-250 
15-286 
15-333 

15- 500 

16- 000 

16- 667 

17- 000 

17-333 


322 

336 

304 

290 

312 

324 

320 

306 

328 

312 

294 

301 

324 

308 
336 
322 
280 
312 
340 
296 

325 
304 
320 

309 
312 

320 
324 
328 
308 
290 
308 
300 

304 

305 

321 

322 

310 
320 
300 

306 
312 


- 0-16 
- 0-741 
4- 108 
4 5-03 


- 1-56! 

4- 0 641 

4- 107 ! 
4- 0-37 j 


4- 4-46 | - 3-56 j 


4 - 1-21 

- 1-19 
4 0-86 

- 0-69' 
4 - 1-88 

I- 2 46 
+ 1-47 
4 1-00 

- 3-85 

- 2-481 

- 1-32 


4-94! 

- 0-83 | 

- 0 22 
4 1 10 

- 1-65 

- 1 82 

- 3-13! 

- 4-75 , 

- 4-26 i 
4- 0 55! 
~ 0-66 1 


4- 0-46! 4- l 42 


- 2-47 

- 0 22i 

- 2-44 
~ 1-22 
4- 2-28 
4 5-70 
4- 6-46 
4- 4-26 
+ 0-40 
4- 2-56 
+ 3 49| 
4- 115 

- 3-78 

- 1-50 
4- 6-32 
4 1-19 

- 0-28 

- 2-35 

- 3-89 

- 6-92 

- 1-46 
4- 5-21 
4- 2-56 

- 3 04 


4-04 , 


4 37 


4- 2-74 i 
4 2-63 j 
4 - 519 j 
4- 3-26 
4- 0-77 | 
- 4-32 i 


4- 0-37 ! 
2-09 I 

- 1 - 34:1 

- 1-00 j 

- 0-18 j 
4- 4-23 

- 2-66 

- 8-52 

- 8-65 

- 7-15 

- 6-55 

- 2-02 
4- 4-52 

- 0-39 

- 6-35 

- 6-65 


Intensity 
N (A 2 4- B 2 ) 
300 


18-67 

23- 23 
2-90 
0-34 
0-95 

10- 41 
7-59 
9-77 
2-65 
2-65 
1-08 
2-26 

24- 55 
33-89 
27-90 

2-25 

0-80 

1- 84 

6- 52 
9-19 

11- 98 

25- 48 
33-84 

7- 24 

2- 34 
2-07 

23-30 

21-66 

11-43 

9-13 

32-58 

46- 01 
43-58 
38-23 

0-32 
11-79 
15-28 
2-38 
13-82 
20-69 
46 83 

75- 04 

76- 17 
60-62 
62-29 
59-11 

• 24-02 
27-33 

47- 84 
54-55 


Period 

(Years). 

Number | 
of Years! 
N. 

-4. 

i 

i 

*• L 

i 

i 

j 

1 

Intensity 

N (A 2 4 £ a ).j 
300 j 

j 

17-500 

1 

280 i 

- 6-18 

- 4-45' 

54-12 i 

18-000 

306 

- 4-40 

+ 1-26! 

21-29 | 

18-500 

296 

- 1-46 

+ 2-25! 

7-10 

19-000 

304 ! 

+ 1-00 

- 0-23 

1-07 

19-750 

316 ! 

- 4-73 

- 1-59 | 

26-25 

20-000 

320 ! 

- 5-71 

4 1*69 j 

37-88 

21-000 

294 | 

4 0-78 

+ 2-61! 

7-28 

22-000 

308 ! 

4- 1*87 

4 1*68; 

6*18 

23-000 

322 j 

- 2-45 

- 1-43 

8-61 

24-000 

288 1 

+ 0-45 

+ 619! 

26-10 

24-667 

296 

4 4-31 

4 1*99| 

22-21 

25-000 

325 | 

+ 3-86 

- 0-19 j 

14-94 

26-000 

312 

+ 1-23 

- 1-34! 

3-43 

27-000 

324 

+ 0-50 

- 0-33; 

0-38 

28-000 

308 

- 0-49 

4 0-68 

0-72 

29-000 

290 

4 1-08 

- 2-12 

5-46 

30-000 

300 

- 1 53 

- 2-34 | 

7-81 

31-000 

310 

- 1-98 

4 0-13! 

4-06 

32-000 

320 

- 0-37 

4 0-51 

0-42 

33-000 

330 

4 0 96 

- 0 78 

1 68 

34-000 

306 

- 3 00 

- 2-15 

13-90 

35-000 

280 

- 4-64 

4 1*79 

23 11 

36-000 

288 

- 1-65 

4 4-85 

23-29 

37-000 

296 

4 2-08 

4 3-92 

19-47 

38-000 

304 

4 2-99 

4 0-56 

9 37 

40-000 

320 

- 1-44 

- 0-63 

2 63 

41-000 

328 

- 1 93 

4 0-93 

5-01 

42-000 

294 

4 0-93 

4 3-02 

9-75 

44-000 

308 

4 3-00 

- 0-14 

9-27 

45-000 

315 

4- 169 

- 1-99 

7-14 

46-000 

322 

i 4 0-16 

- 2-27 

5-58 

48-000 

288 

- 0-76 

- 0-09 

0-56 

50-000 

300 

4 183 

4 2-19 

8-14 

52-000 

312 

4 4-77 

- 0-57 

24-03 

53-000 

318 

4 4-22 

- 2-60 

26-08 

54-000 

324 

4 2-84 

- 4-01 

26-09 ! 

55-000 

330 

4 3-54 

- 3-30 

25-82 

56-000 

336 

4 3 31 

L - 2-36 

18-47 j 

58-000 

290 

4 3-89 

4 1*49 

16-82 

60-000 

300 

- 3-08 

- 0-93 

10-32 ! 

62-000 

! 310 

- 1-62 

4 0-39 

2-88 j 

64-000 

1 320 

- 0-78 

4 0-13 

0-66 j 

66-000 

330 

- 0-fifl 

- 0-56 

0-69 S 

68-000 

1 340 

4 2-9< 

) - 1-88 

! 13-58 

70-000 

280 

- 0-69 

l - 0-16 

0-47 

74-000 

296 

- 1-2C 

> 4 0-82 

2-07 1 

76-000 

304 

- 0-6C 

\ 4 117 

1-83 | 

78-000 

312 

4 0-5* 

l 4 1*26 

2-00 I 

80-000 

320 

4 0 V> 

J 4 0-82 

1-34 S 

84-000 

• 336 

4 0-2 

6 4 0-6^ 

) 0-62 j 
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An examination of the periodogram suggests the possibility of 20 periods, as follows :— 


Period 

(Years). 

Corrected Intensity 
N(A* 4- B*). 

300 " 

Period 

(Years). 

Corrected Intensity 

N ( A * + B*). 

300 

2-735 

7-82 

11-000 

33 84 

3-417 

15-84 

12-000 

23 30 

4-417 

16-48 

12-800 

46-01 

5-100 

42 34 

15-250 

76-17 

5-415 

23-66 

17-333 

54-55 

5-667 

32-72 

20-000 

37-88 

5-033 

23-63 

24-000 

26-10 

7-417 

21-72 

35-000 

23-29 

8-091 

23-23 

54-000 

26-09 

9-750 

33-89 

68-000 

13-58 


This is evidently rather an embarrassing profusion of possibilities, and we cannot 
immediately accept all these periods as significant. Sir William discussed them in detail 
in the original paper and was inclined to attribute reality to 18 or 19 of them, partly on 
grounds which do not concern us here, such as the existence of weather oscillations with 
these “ periods In particular, where a period had a high intensity he analysed the 
two halves of the series separately to see whether the periods persisted, finding that most 
of them did. 


30.45. An inspection of the correlogram of the series in Fig. 30.5 reveals a striking 
difference between the two methods of analysis. From the correlogram we should be 
inclined to suspect a mean period of about 15 years, corresponding to the peak of greatest 
intensity in the periodogram, with a subsidiary ripple of about 5 to 6 years’ period, corre¬ 
sponding to one or more of the peaks in the periodogram ; but of the other 18 periods there 
is no sign. The conclusion is inevitable that either the correlogram is insensitive or the 
periodogram is misleading. Having raised this highly important question we shall, unfor¬ 
tunately, have to leave it unsettled in part; but we shall show that at least three-quarters 
of the periods thrown up for consideration by the periodogram are not significant. 


30.46. The calculation of the intensity 8 2 depends on that of the quantities A and B 
of equations (30.67) and (30.68). Suppose in the first place that our trial period ft is an 
integer. We then write down the series in rows of //, thus :— 


U X 

u % 

u z 

. . . u fl > 

^+1 

U H+2 

^#4 + 3 

• . . U 2lx 

W (P-1) #4+1 

u (p-\) #4 + 2 

u (p~l) #4+3 

. . . u Pfi 

Totals m 1 

m t 

ntz 

. . . wv 


(30.73) 


We continue writing down the rows until there are fewer than fi terms remaining, the 
extra terms being left out of account. The number pfi is then as near in multiples of 
as we can get to the number in the series w, and may be denoted by N. This array is some¬ 
times known as the Buys-Ballot table. 
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We then form the sum— 


Pt l l 


2tC 4c7t 

m t cos — + m 2 cos-b 

fi ii 


. + m u cos - ---- y 

J 


(30.74) 


and this is clearly the quantity A of (30.67) for the series of N terms. Similarly we have 

. (3o - 76> 


If the trial period /1 is a rational fraction - we write the series down in rows of v and 

a 

proceed in the same way ; and if it is irrational or is a number which gives a large value 
of v when expressed as a fraction, we take two convenient neighbouring values of fi and 
interpolate in the periodogram. 


30.47. In actual practice we do not write down the array (30.73). The sums m 
may be formed on an adding machine by starting with u l and then adding every fith mem¬ 
ber to give m x ; then starting with u 2 and adding every fith member to give ra 2 , and so on. 
Or alternatively, the values may be written on cards, one for each member of the series, 
and the pack dealt into fi heaps. The total of the m’s, together with any members left 
over, equals the sum of the series and provides a check on the work. 


Example, 30.8 

Consider the Beveridge series of Table 30.1. For the trial period 2 we may take 300 
terms of the series, and m\ (about zero mean) will be the sum of the values u l9 u 3 . . . u. 2Q9 
and m 2 will be the sum of the values with even subscripts. These sums are for the years 
1545 to 1844 inclusive, 

m[ — 14,909 
m 2 - 14,893. 

The mean is. 14,901, so that about the mean of the series 

m x = + 8 

m 2 ~ — 8. 

Now, for a trial period 2, sin ~ vanishes and hence B = 0. For A we have (in our nota- 

Zt 


tion, which gives different signs 


from Beveridge’s to A and B)— 

. 2 f 2n . 4tt T 

A ~ -< m x cos — -b cos — y 
300 \ 2 2 I 

— {m 2 — m x } 

300 1 j 


32 

300 


Oil. 


Thus 


8 2 (corrected) = — ^ A 2 = 0*01, 


as shown in Table 30.9. 

% 13 

For a trial period 2-600, we could take fi — — and arrange the series in rows of 13, 

requiring 23 rows accounting for 299 values of the series. We may, however, save our¬ 
selves some arithmetic by taking 24 rows, a multiple of 4, occupying 312 observations. 
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Or rather, we take 6 rows of 52, giving us the values for a trial period 52 ; then add m t 
to m a », m, to m t » and so on, giving the result we would have got by taking 12 rows of 26 
and hence providing the values for a trial period of 26 ; then we add again in the same way, 
and so on, obtaining successively the values of m required for trial periods of 13, 6*5, and 
3*25. Similarly, by multiplying the original 52 values of m by the respective values of 

cos and sin we get the values of A and B required for a trial period of It is 
52 52 10 

thus evident that we can use the single set of 52 values of m to provide the required constants 

for trial periods and so forth. This is the main reason why, in Table 30.9, 312 

observations are shown as N for the trial periods 2-080, 2*261, 2*364, 2*476, 2*600, 2*737, 
2*888, 3*250, 3*714, 4*333, 5*200, 6*500, 7*429, 8*667, 10*400, 13*000, 17*333, 26*000 and 
52*000. The arithmetic, though difficult enough, is not as laborious as appears at first sight. 


30.48. There is an interesting relation between the periodogram and the correlogram 
by which the latter, in theory, determines the former. We consider, as in 30.38, a function 
u (t) defined at every point of time in some range — h to h. Then 

a (p) + ifi (p) = ^ | e ipt n (t) dt 

= ^ f cos pt u (t) dt + ^ ( sin pt u (t) dt . . (30.76) 

* J -ft * J -ft 

corresponds to the sums of (30.67) and (30.68) and may be written A + iB, where 



. (30.77) 


It follows that the intensity S s is related to the Fourier transform of r (k) by the relation, 
derived from (30.63), 

S 2 = 2$ r (p) 

= ~\ r (k) e**® dk, . . . (30.78) 

which is true also in the limit, subject to conditions of existence. Thus the intensity is, 
if r (k) exists over an infinite range, the quantity— 

2 C 2h 

lim T | r (&) cos kp dk , 

* J - 2 * 


and if R (k) exists the parallel quantity— 


v4h< 


r_. 


B (k) cos kp dk. 


ie periodogram is thus derivable from the autocorrelation function. Since the latter 
does not uniquely determine the series the periodogram will not do so either. 


Example 30.9 

Consider the autocorrelation function, which in present notation may be written 

p* sin (led + y>) 



SIGNIFICANCE OF A PERIODOGRAM 433 

This, as we have seen, represents the correlogram of an autoregressive series of the simple 
linear kind involving u t+2i u t+l and u t . We may write this as 


R (k) = t- Qk ® in 


q > 0 


since p is less than unity. It is to be remembered that since R (— k) R (k), the modulus 

of k is to be used when k is negative. 

We have 


sin (kO + %p) 


cos kp dk 


cos kO cos kp dk 


q* + (0 +p)*~ T q* + (0 -~P)*' 

2 n 

This is the intensity in the periodogram of the series, p being the quantity — and not to 

p 

be confused with our original damping factor p. 


It is remarkable that, as p becomes large, S 2 tends to the constant value 


q* + o »’ 


that is to say, the periodogram tends to a fixed level, without peaks. From the analogy 
with the analysis of light-rays into colours (each colour corresponding to a particular har¬ 
monic), we may say that the periodogram develops a “ continuous spectrum ”. In a 
very interesting chapter on periodogram analysis Davis (1941) has given a number of 
examples exhibiting this kind of effect. 

Significa nce of a Periodogram 

3 O. 49 ! Suppose that tKe values u t . . . u n are random elements from a normal 
population with variance a 2 . Then the function 


A 

nA* 


is normally distributed with variance 


n 2 Aj 


and similarly 


var B = 


. (30.79) 


. (30.80) 


We also see that cov (A } B) = 0 so that A and B are independent. Hence the joint 
distribution of A and B is 


dF = 


L (A 1 + -B*) | dA dB. 


. (30.81) 


A.S. —VOL. n. 


F F 




434 


TIME-SERIES 


Thus the distribution of S* = A * + B* is 

. 

^^21C 

The probability that S 2 exceeds- in value is immediately obtainable as e - *. 


(30.82) 


30.50. This result is due to Schuster (1898), but it gives only the probability that 
a value of S* chosen at random will exceed a given value; whereas in the periodogram 
we deliberately pick out the biggest values for inspection. Walker (1914) pointed 4 out that 
if e~‘ is small the probability that all of m independent values of S 2 should not exceed 
4cr^#c 

is (1 — so the probability that at least one should exceed that amount is 

1 - (1 - e~ K ) m . . . . . (30.83) 

Davis (1941) gives tables of this function. 

30.51. Both the Schuster and the Walker tests depend on a knowledge of a 2 ./ Since 

4(T 2 

the mean value of S 2 in (30.82) is the usual procedure is to consider the test as a com¬ 
parison of S 2 with E (S 2 ); but a 2 itself has to be estimated from the original data. 


30.52. Fisher (1929a) has given a test which avoids the inexactitude due to the 
estimation of a 2 . If v is the estimate and S 2 is the largest intensity, then the probability that 



. (30.84) 


will exceed a given value is 

V (1 - g)’~ x - ( 2 ) a - 2?)'- 1 +...+(- I)"*- 1 (! - mgY- 1 , (30.85) 


where v = £ (n — 1), n being the (odd) number of observations, and m is the greatest 
integer less than l/g. The result was extended by Stevens (1939a)—see also Fisher (1940a) 
and Finney (1941a). Davis (1941) also gives tables of this function. 


30.53. All the tests we have described are based on random normal variation in the 
original series; but in practice nobody would embark on the labour of a periodogram 
analysis unless he had satisfied himself that the data were not random. It seems to me, 
therefore, that these tests are really off the main point, being tests based on a hypothesis 
which we have already rejected. They are not without their usefulness, however. We 
may assume with some confidence that if a particular intensity in the series is not shown 
as significant on the hypothesis of random variation, it is not significant when the series 
is systematic. What does not follow is that if one intensity is significant then others must 
be so, even if they exceed the significance values; for they are not independent of the 
significant value, at least for short series. What we ought to do, perhaps, is to extract 
the component which is considered significant from the series and then analyse the 
remainder ; and so on as long as significant terms appear. But this is hardly a. practical 
computational possibility. Tests of significance in the periodogram, as in the correlogram, 
remain undiscovered. 



LAG CORRELATION 


435 


Example 30.10 

Let us examine the significance of the 20 periods of the Beveridge periodogram given 
in 30.44. 

4<7 2 

Sir William gave the value of — in his original paper as 5*898. Expressing the 
intensities as a multiple * of this amount, we find :— 


Period. 

K. 

Period. 


• 

2-735 

1-33 

11-000 

5-74 


3-417 

2*69 

12-000 

3-05 


4-417 

2-79 

12-800 

7-80 


5-100 

7-18 

15-250 

12-91 


5-415 

4-01 

17-333 

9*25 


5-667 i 

5*55 

20-000 

; 6-42 


5-933 

4-01 

24-000 

443 


7-417 , 

3-68 

35-000 

3-95 


8-091 | 

3-94 

54-000 ; 

4-42 


9-750 i 

i 1 

5-75 

68*000 

2 30 



There are 305 trial periods in Table 30.9. Let us consider the probability that at least 
one of 305 independent values of k will exceed given values, that is to say, the probabilities 
given by (30.83). We find— 

k Probability. 


2 

4 

6 

8 

10 


1*000 

0-996 

0-531 

0-097 

0-014 


On this basis we should be inclined to attribute significance to the period 15*25, for which 
k — 12*91. We have no right to be surprised that at least one value exceeds k — 6. If 
we take this value as the critical one, only the periods 5-100, 12*800, 15-250, 17-333 and 
20*000 would be significant, that is to say, five out of 20. 

Again, since e 6 — 0*007, we should expect to find in 305 independent members two 
in excess of 5. Actually there are eight. But they are not independent and we cannot 
rely on this comparison to say that six are significant. On the whole, however, it looks 
as if at least three-quarters of the periods are not significant, and possibly more. The 
example will illustrate the difficulty of testing the significance of the periodogram as a whole. 


Lag Correlation 

30.54. The idea of serial correlation can be extended to the joint variation of two 
series. If we have two series u (t), v (0 in standard measure, we may define the lag corre¬ 
lation of order k as 

r (k) = J u (0 v(t + k)dt, . . . . (30.86) 

where the integral includes summation in the case when the series are specified at equi¬ 
distant points of time. We note that in this case r (k) is not equal to r (—k) and r (0) 
is not unity. 
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Table 30.10 shows the lag correlations between two series of English wheat prices and 
horse populations (for the original series see Kendall, 1944a). The data are shown as a lag 
correlogram in Fig. 30.10. 


TABLE 30.10 

Lag Correlations for Two Series of English Wheat Prices and Horse Populations (Deviations 

from a Simple Nine-Year Average). 

(The order of the correlation is the number of years by which horse population lags behind wheat price, 
e.g. r 10 is the correlation of wheat price with the horse population of ten years earlier.) 


Order of 
Correlation 
k. 

ric • 

Order of 
Correlation 
k. 

rk • 

- 10 

- 0-22 

1 

- 0*24 

- 9 

- 019 

2 

- 0*36 

- 8 

- 0-24 

3 

- 0 12 

- 7 

- 016 

4 

016 

- 6 

- 0 09 

5 

017 

- 5 

007 

6 

0*39 

- 4 

0-27 

7 

0*36 

- 3 

0-31 

8 

016 

- 2 

041 

9 

- 016 

- 1 

0-26 

10 

- 0-44 

0 

1 

- 012 

1 

1 



Fig. 30.10.—Lag Correlation of Wheat Prices and Horse Populations (Table 30.10). 
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The systematic appearance is unmistakable and we notice in particular that the maximum 
correlation occurs between the wheat price and the horse population of two years later. 
This bears the obvious explanation that when a farmer earns more he buys or breeds more 
horses; but it does not follow logically that this must be so or that there need be any 
causal nexus between the two series. If two autoregressive series are oscillating with 
mean periods which are close together and only a short span of experience is available for 
scrutiny, then lag correlations of the damped sinusoidal type may appear, as it were, by 
accident. 


30.55. We have now reached the end of our account of the statistical analysis of 
time-series and the end of this book ; and the final words we have to say of the one will 
apply generally to the other. Much has been left unsaid, partly from lack of space, partly 
from deficiencies in the present state of knowledge, and partly from a desire not to over¬ 
burden the reader. We have not avoided mathematical analysis where it was necessary 
to advance the argument; but we have insisted on the expression of results in numerical 
form and the necessity of experimental confirmation whenever it could be obtained. That 
there are gaps in the treatment we have given and unexplored branches of the subject 
to which we have barely referred are not entirely matters of regret; for the over-early 
and peremptory reduction of knowledge into arts and methods is one of the errors which 
Bacon cautioned us against more than 300 years ago. Much remains to be done ; and this 
book will have served its purpose if the reader is left with the desire to do some of it himself. 


NOTES AND REFERENCES 

The theoretical aspects of the autoregressive series and of moving averages are dis¬ 
cussed in Wold’s book on The Analysis of Stationary Time-Series ( 1938a). The basic 
memoir is that by Yule (1927a) on sunspots. For applications to meteorolo gy see Walker 
(1931) and to economics Kendall (1944a). Davis’s book on The Analysis of Economic Time 
Series (1941) contains a great deal of interesting material but should not be read uncritically. 
Two earlier papers by Yule (1921 and 1926) are also of interest. See also my paper on 
“ The Analysis of Oscillatory Time-Series ” in the Journal of the Royal Statistical Society 
for 1945, a paper by Yule in the same journal, my brochure (in press) on “ Researches in 
Oscillatory Time-Series ”, and a symposium introduced by Bartlett in the Supplement to 
the Journal for 1946. 

The classical work on periodogram analysis is that of Schuster (1898). The books 
by Brunt (1931) on The Combination of Observations and by Whittaker and Robinson 
(1940) on The Calculus of Observations contain useful introductory accounts ; and Davis’s 
book referred to above has an excellent chapter illustrated with an unusual number of 
examples. Papers by Crum (1923) and Greenstein (1935) are of interest. The papers by 
Sir William Beveridge (1921, 1922) on wheat prices and rainfall have been justly described 
by Davis as a heroic piece of periodogram analysis. Tables facilitating the calculation 
of intensities were published by Turner (1913), and more complete tables will be given ip 
my brochure referred to above. See also the book by Stumpff (1937). 

Various short-cut methods of periodogram analysis have been proposed by several 
authors, e.g. Oppenheim (1909), Bruns (1921) and Alter (1933, 1937); but their value is 
problematical. There is a useful memoir by Bartels (1935) which is worth studying. 
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EXERCISES 


30.1. For the autoregressive series 


2 4* 1 4“ 

show that if s is a random variable and the series is long, 


var u _ 1 + b 

vare ~ (1 - 6) {(1 + 6) a - a 2 }’ 

and hence that the variance of the generated series may be much greater than that of 
e itself. 


30.2. For the autoregressive series of the previous exercise use the relation 
r k + 2 4- ar k+1 + br k = 0, k > — 1 

to derive the relation 

__ p k sin (kO + %p) 

Tk sin ip 


30.3. If the estimated coefficients a' and 6' in the autoregressive scheme are reduced 
in the manner of 30.32 by a superposed error, show that * 



(Yule, 1927a.) 


30.4. Show that if, in the autoregressive scheme of Exercise 30.1, 6 = 1, the series 
becomes undamped and the correlogram reduces to a simple harmonic. Examine the 
effect on the solution (30.23). 


30.5. If any series has fitted to it a series generated by the scheme of Exercise 30.1, 
a and b being any constants, show that for the serial correlations of the residuals, say a k , 
we have 


(1 4^ a 2 4- 6 2 ) p k + a (1 + 6) (p k +i + Pk- i) +_Pk -*). 
1 -f- a 2 -f- 6 2 -f- 2a (1 -f 6) p± -j- 


30.6. 


Show that the series with an autocorrelation function 


r(k) = 


sin Xk 

-JOT 


has a periodogram which is zero for periods less than -z and has ordinate y for periods greater 

A A 

_ ft 

than i.e. has a continuous spectrum. 




30.7. In equation (30.71), noting that the dominant term vanishes for a — (i = 

71 

where m is an integer, show that for such a “ vanishing ” trial period (i 

/j, — X^l approximately. 
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Hence the width of a peak in the periodogram is approximately —, and the main peak 

TV 

will be flanked by smaller peaks of the same width. (This “ side-band ” effect is another 
complication in the interpretation of the periodogram, but not apparently a very serious 
one.) 


30.8. If a series of values Ui ... u n is supplemented by a number of zeros as 
u 0 y W-i, w_ 2 • • • u n+i> u n+ 2 > etc., as far as is necessary, and the resulting series differenced, 
show that 


r i 





+ 2(- 1 )*P j9 


where x j is the sum of squares of jth differences and P j = 



x k x k+j . 


Hence show that 


the arithmetic of serial correlation may be related to that of the variate-difference method, 
and vice-versa. 


30.9. Show that the serial correlations of a long series obtained by differencing a 
random series m times are given by 

. m (m — 1) . . . (m — k + 1) 

(m + 1 ) . . . (m + k) 
and hence that the correlogram of such a series oscillates. 

(Yule, 1921.) 


r (*) = (-!)* 


30.10. The Whittaker periodogram. Writing 

2 x var m 

V 2 it*) «-> 

var u 

where var u is the variance of the series and var m is the variance of the sums m of (30.73), 
show that if 

Uj = a sin ^ 

where b rj is uncorrelated with periodic terms, then 

aV 2 sin 2 ^- 

--+ ^ var b 

—■ lit —- W4 - 

Hence show that, in the neighbourhood of A, the graph of rj as ordinate with fx as abscissa 

2A 2 

(Whittaker’s periodogram) has a peak of breadth flanked by smaller peaks. 

(Whittaker, Month. Notes B. Astr . Soc., 1911, 71 ; cf. Whittaker and Robinson, Calculus of 
Observations.) 
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ADDENDA TO VOLUME I 

(1) Frequency and Distribution Functions 

An interesting paper by Burr (1942) considers the possibility of fitting elementary 
mathematical functions, not to the frequency function as has been the almost universal 
practice hitherto, but direct to the distribution function. This approach seems to merit 
further attention. In general, the distribution function has fewer analytical peculiarities 
than the frequency function—for instance, it cannot be infinite—and in applications to 
sampling it is the former which is nearly always required. The frequency function can, 
of course, be derived from the distribution function to a close approximation by differ¬ 
encing, or differentiation, processes which are usually easier to carry out than the inverse 
processes of integration. 

(2) Extension of the Carleman Criterion (4.22) 

Cramer and Wold (1936) have extended Carleman’s criterion for uniqueness in the 
problem of moments in the following form:— 

If 

h — A*ioo... + /%<>... d" /%>... + • • • • 
the distribution is completely determined by its moments if 



diverges. It is rather interesting that the criterion is independent of the product-moments. 

(3) Convergence of Series Leading to Standard Errors 

The usual type of expansion in differentials, exemplified in 9.6, raises a point of mathe¬ 
matical difficulty in that the differentials themselves and the remainder terms, though 
usually small, may sometimes be large for sampling reasons, however large the sample. 
The necessary rigorisation of the process has been given by Derkson (1939) in terms of the 
notion of stochastic convergence, that is to say, a sort of statistical convergence in which 
the series converges nearly always in a precisely defined sense. 

(4) Moments of Moments for Finite Populations 

The formulae for moments of the mean and variance in samples from a finite population 
were stated without proof in 11.26. It is obvious that if in these results we let N, the 
population number, tend to infinity, we obtain the formulae for sampling from an infinite 
population. Irwin and I (1944) have recently shown that the process may be reversed 
and the formulae for the finite case derived from those for the infinite case. This offers 
the simplest and most direct method of deriving the formulae, known to me. Reference 
may also be made to Sukhatme, “ On Bipartitional Functions ” (Phil. Trans., 1938, A, 
237 , 375) and “ Moments and Product-Moments of Moment-statistics for Samples of the 
Finite and Infinite Populations ” (Sankhya, 1944, 6, 363). 
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(5) Tied Ranks 

In the treatment of rank correlation in Chapter 16 it was assumed that ranking was 
always possible ; but in practice cases occur when two or more individuals “ tie ” and the 
ranks have to be equalised in some way. This possibility introduces the most intractable 
complications into theoretical work, but sometimes ties occur so frequently that a systema¬ 
tic method of dealing with them is necessary. The subject has been reviewed and recon¬ 
sidered by Woodbury (1940) and more recently by myself ( Biom ., 1945, 33, part 3). 

« 

(6) Coefficients of Rank Correlation 

Daniels (1944) has recently unified the theory of rank correlation by showing that 
Spearman’s p, my r and the product-moment coefficient are particular cases of a general 
coefficient. In particular he has demonstrated the formula for the covariance of p and t 
given in 16.24 as very probably true. 
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Curvilinear regression, 145-74. BibL, Menders- 
hausen (1937a) 477, T. V. Moore (1937) 
478 ; and see Regression. 

Cycle, 397-8. See Periodicity. 

Cyclical effects, tests for, 124-7, 370. See 

Periodicity. 

D a -statistic, N.R., 359. BibL : Bhattacharya and 
Narayan (1942) 446, R. C. Bose (1936a, 6) 

447, R. C. Bose and Roy (1938c, 1940) 

448, S. N. Bose (1935, 1937) 448, Roy 

(1939a) 489. See also Discriminatory 

Analysis, Multivariate Analysis. 

Daly, J. F., on shortest confidence intervals, 82; 
on bias in tests, 323 ; N.R. , 304. 

Daniels, H. E., (Example 23.2) 183-5; rank 
correlations, 441. 

Dantzig, G. B„ N.R. , 304. 

David, F. N., confidence intervals for correlations, 
81 ; N.R., 304. 

Davis, H. T., time-series, 433, 434; N.R., 394, 
437. 

Day, E. E., N.R., 245. 

Death rates, BibL, Farr (1919, 1920) 460, Pearson 
and Tocher (1916c) 485. 

Decomposition of series, Bibl., Andersen (1927) 
443, Smirnoff (1935) 491. See also Time- 
series. 

Decreasing functions, Bibl., C. D. Smith (1939) 
491. 

Degrees of freedom, of “ Student’s ” t, 102; of 
hypotheses, 270. 

De Lury, D., N.R., 137. 
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Denumerable probabilities, Bibl., Steinhaus (1923) 
492. 

Dependence, see Independence, Correlation. 

Derkson, J. B. D., on stochastic convergence, 440. 

Design, of sampling inquiries, 247-68; pre¬ 
liminary points, 248-9 ; stratified sampling, 
249-52; design of experiments, 252-4; 
orthogonality, 254; replication, 255; 
randomisation, 255-6; sensitivity of a 
test, 256-7 ; Latin squares, 257-62 ; con¬ 
founding, 262 ; design and randomisation, 
263-6. 

'Bibl. : Bhattaoharya (1943) 446, Chris- 
tidis (1931) 451, Fishe*(1935c) 462, Jeffreys 
(1939e) 471, “ Student ” (1938) 493, Wold 
(1943) 498, Yates (1939e) 502. See also 
Blocks, Factorial Experiments, Latin 
Squares, etc. 

Determinantal equations, Bibl., Girshik (1939) 
465. See also Matrix. 

Deviance, footnote, 178. 

Difference, of two means, test of (equal variances) 
109-11 ; (unequal variances) 111-14. See 
also Behrens' Test, Two Samples. 

-, of two variances, 115-16. 

-, equations, Bibl., Frisch (1932) 463, 

Marples (1932) 477. See also Auto¬ 

regression Equations. 

Differences of variates, Bibl., Irwin (1937a) 470. 

Dilution method, Bibl., R. D. Gordon (1939) 465, 
Matuzewski and others (1935) 477. 

Diriohlet integrals, 298. 

Discontinuous variates, Bibl. : dell’ Agnola (1937) 
456; Guldberg (1934) 466, Muench (1938) 
478, H. W. Norton (1937) 481, Ottestad 
(1937, 1938) 481. 

Discordant samples, 128. 

Discriminatory analysis, discriminant function, 
341-8. Bibl.: Barnard(1935)444,Bartlett 
(1939c) 445, Dwyer (1942) 458, Fisher 
(1936a, 1938c, 19396, 19404) 462, P. L. Hsu 
(19396, 1941a, 1941c) 469, H. F. Smith 
(1936) 492, Travers (1939) 495, Wallace 
and Travers (1938) 498, Welch (19396) 498, 
Wilks (19384) 500. See also Multivariate 
Analysis. 

Dispersion, Bibl., Norris (1938) 481. See Variance, 
etc. 

-matrix, 330, 341, N.B., 358. 

Dissection of frequency-distributions, Bibl., Burrau 
(1034) 450. 

Distributed lags, see Lags. 

Distributions, generally, Bibl. : Ambarzumian 
(1937) 443, Baten (1933a) 445, (1934) 446, 
Bisphnm (1922) 447, Bochner and Jessen 
(1934) 447, Bochner (1937) 447, Bowley 
(1933) 448, Burr (1942) 450, Camp (1937) 
450, Cannon and Wintner (1935) 450, 


Chapelin (1932) 451, Cramer and Wold 
(1936) 454, Edgett (1931) 458, Eyraud 
(1938a) 459, Glivenko (1933) 465, Guldberg 
(1935) 466, Hansmann(1934) 467, Hartman 
and others (1937) 467, (1939) 468, Haviland 
(1934a, 6, 1935, 1939) 468, R. Henderson 
(1907) 468, Jessen and Wintner (1935) 
471, Khintchine (1937a) 473, Kullbaok 
(19366) 474, Mazzoni (1934) 477, K. Pearson 
(1923c, 1924a) 485, R. Schmidt (1934) 490, 
von Mises (1939a) 497. 

Dodd, E. L., period generated by moving average, 
384, N.R., 394. 

Doob, J., N.R., 45. 

Dosage-mortality, Bibl., Garwood (1941) 464. 

-response, Bibl., Irwin and Cheesaman (1939) 

470. 

Dugu6, D., N.R., 45. 

Duration of play, Bibl., de Finctti (19396) 456, 
Fieller (1931a) 460. 

Eden, T., on Fisher’s distribution, 206, (Example 
23.8) 214, N.R., 216. 

Edgeworth, F. Y., N.R., 45. 

Edwards, J., Integral Calculus, footnotes, 44 and 
50. 

Efficiency, of estimators, 5-7 ; of maximum 
likelihood estimators, 18-19 ; of moments 
in fitting Pearson curves, 43-4 ; of sampling, 
Bibl., Yates and Zacopanav (1935c) 502. 

Egg-production, in laying hens, (Table 29.5, 
Figure 29.5) 368. 

Egyptian skulls, (Example 28.3) 345-8. 

Elasticity of demand, Bibl., Mosak (1939) 478, 
Schultz (1933) 490. 

Elderton, E. M., (Example 21.14) 133, N.R., 266. 

Elderton, Sir William P., N.R., 45. 

Electric lamps, testing of, (Example 23.1) 179-80. 

Elimination of variates, in regression analysis, 
167-70. 

Enumeration in sampling, Bibl., Cochran (19396) 
452. 

Equidetectabiiity, curves of, 318. 

Equimodal distributions, Bibl., Mouzon (1930) 478. 

Error, in variance-analysis, 187. 

Errors, of first and second kind, 270, (Exercise 
26.5) 305. 

-, general theory of, Bibl.: B.relot (1936, 

1937) 449, Campbell (1935) 450, Cramer 
(1928) 454, Doming and Birge (1934) 456, 
Edgeworth (1905, 1906) 458, Jeffreys (1933, 
1937c, 19384, 19394) 471, Mahalanobis 
(1922) 476, Wertheimer (1932) 499. See 
also Least Squares. 

Estimation, generally, 1-49, 50-62; in analysis 
of variance, 181, 218-19. 

Estimator, definition, 2 ; consistence ;Qf, 3 ; bias 
of, 3-4 ; efficiency of, 5-40 ; sufficiency of. 
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7-12; approximation to, 22-4; most 
general sufficient form, 24—5 ; accuracy of, 
28-9; ancillary, 32-3; in multivariate 
case, 33-42; location and scale, 40-2; 
by minimum variance, 50-5 ; by minimum 
X *» ; by inverse probability, 58-9 ; 

by least squares, 59-60. See also Maxi¬ 
mum Likelihood, Minimum Variance. 

Bibl. : Aitken and Silverstone (1942) 
443, Beall (1939) 446, 4 S. S. Bose and 
Mahalanobis (19386) 448, Darmois (1935, 
1936) 455, O. L. Davies and Pearson (1934) 
455, Doob (1936) 467, Dugu6 (1936a, 6, 
19376) 458, Fisher (19256) 461, (1934d, 
19386, d) 462, Geary (1942, 1944) 464, 
Halphen (1939) 467, Neyman (19376) 480, 
E. S. Pearson (1937a, 1939) 483, Pitman 
(19376, 1939a) 486, Wald (1939a) 497. 

Expectation of lifo, see Life. 

Expected values, see Mean Values. 

- case, in sociological data, Bibl., Stouffer and 

Tibbits (1933) 493. 

Expenditure of families, (Example 23.9) 214-15. 

Exponential distribution, (Exercise 26.8) 305-6. 
Bibl., Paulson (1941) 482, Sukhatme (19366) 
493. 

Extra-sensory perception, Bibl., Greenwood and 
Stuart (1937) 465, Stevens (19396) 493. 

Extremes, distribution of, Bibl. : Daniels (1941) 

455, de Finetti (1932) 455, Dodd (1923) 

456, Fisher and Tippett (1928a) 461, 
Gumbel (1934, 1935a) 466, McKay (1935) 
477, Olds (1935) 481, Tippett (1925) 495. 
See also mth Values. 

F-distribution (variance ratio), Bibl., Merrington 
and Thompson (1943) 478. See Fisher’s 
Distribution. 

Factor analysis (psychology), Bibl . : Bartlett 
(1937e) 445, W. Brown (1935) 449, Burt 
(1937a, 6, 1938a, 6) 450, Camp (1932, 1934) 
450, Darmois (1934) 455, Emmett (1936) 
459, Hoel (1937, 1939) 468, Irwin (1933) 
470, Ledermann (1938) 475, Roff (1937) 
489, Thomson (1916, 19196, 1939) 494, 
Thurstone (1935, 1938) 495. 

Factorial experiments, 199-202. Bibl. : Barnard 
(1936) 444, R. C. Bose and Kishen (1941) 
448, Cornish (1936, 19406, c) 453, Goulden 
(1937, 1938) 465, P. L. Hsu (1943) 470, 
Kishen (1940) 473, Wishart (1938) 601, 
Yates (19376) 502. 

--moments, Bibl., Gonin (1936) 465, Ottestad 

(1939) 481. 

- sums, in fitting regressions, (Example 22.8) 

*64-5. 

Factorisation of variables, Bibl., S. C. Dodd (1927) 

457, 


Families of alternatives, 275-6. 

Feller, W., N.B., 303. 

Fiducial inference, 85-95. Bibl. : Bartlett (1939a) 
445, Fisher (1933, 1935a, 19356, 1936c, 
19376, 1939a, 1940c, 1941a) 462 ; Garwood 
(1936) 464, Ricker (1937) 488, Segal (1938) 
491, Wilks (19386, c) 499, (1939a, 6) 500. 
See Confidence intervals. 

Field experiments, Bibl., Wishart and Saunders 
(1935) 501. See Design. 

Fifteen-constant surface, Bibl., K. Pearson (1925a) 
485. 

Filon, L. N. C., N.R., 45. 

Finito populations, sampling from, Bibl. : Church 
(1926) 452, Hansen and Hurwitz (1940) 
467, Irwin and Kendall (1944) 470, Isserlis 
(1918c, 1931) 470, Neyman (1925) 480, 
O’Toole (1934) 481, Sukhatme (1944) 494, 
Tschuprow (19186, 1921, 1923) 495. 

Finney, D. J., 2 -test, 199 ; test of significance in 
poriodogram analysis, 434 ; N.R., 137, 216. 

Fisher, R. A., fating by moments, 43 ; fiducial 
probability, 90 ; tables for Behrens’ test, 
92, 93, 111; expansion of “Student’s” 
integral, 101; tables of t, 102 ; difference 
of two means, 110; 2 -distribution, 116, 
117; configuration of a sample, 127 ; 
fitting regressions, 165 ; theorem on sum 
of squares, 176-7 ; design of experiments, 
263; discriminatory analysis (Example 
28.2) 342-4; distribution of canonical 
correlations, 357 ; significance of a periodo- 
gram, 434 ; N.R., 45, 61, 83, 94, 136, 173, 
216, 245, 266, 359. 

Exercises from: (Exercise 17.1) 45, 
(Exercises 17.4, 17.5, 17.6) 46, (Exercise 
17.12, 17.15, 17.16) 48, (Exercise 17.19) 49, 
(Exercise 18.3) 61, (Exercises 20.1, 20.2) 
94-5. 

Fisher’s distribution (z-distribution), properties of, 
116-18.; in variance analysis, 179, 199; 
in non-normal case, 205-6, 234-6, (Example 
26.8) 289 91 ; in linear hypothesis, 301 ; 
in discriminatory analysis, 345. 

Bibl. : Aroian (1941) 444, R. A. Chap¬ 
man (1938) 451, Cochran (1940a) 452, 
Daniels (1938a) 454, Eden and Yates (1933) 
458, Fisher (1924c) 461, P. L. Hsu (1941c) 
469, Lawley (1938) 475, McCarthy (1939) 
477, Paulson (1942) 482, Welch (1937) 498. 

Fitting, see Curve Fitting, Least Squares. 

Flood flows, Bibl., Gumbel (1938a, 1941) 466. 

Fluctuations in time-series, Bibl., R. A. Gordon 
(1937) 465. See Time-series. 

Forecasting, Bibl.: Cowles (1933) 453, Cowles and 
Jones (1937) 453, de Finetti (1937) 456, 
Schultz (1930) 490, Yates (1936c) 502. 

Forsyth, A. R., Calculus of Variations, footnote, 50. 
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Fourier analysis, see Harmonic Analysis, Period¬ 
icity. 

Fragmentary samples, Bibl., Wilks (1932a) 499. 

Frankel, L. R., N.R., 136, 266. 

Freedom, degrees of, see Degrees of Freedom. 

Frequency-distributions, see Distributions. 

Frequency theory of probability, Bibl. : Campbell 
(1939) 450, Cantelli (1923, 1932, 19336) 450, 
(1936) 451, Ddrge (1934, 1936) 458, von 
Mises (1931) 497. See Probability, Random 
Sequence. 

Friedman, M., (Example 23.9) 214-15. 

Frisch, R., N.R., 358. 

Galton’s problem, Bibl. : Gal ton (1902) 464, Irwin 
(1925a) 470, K. Pearson (1902c) 484. See 
Rank Correlation. 

Gamma distribution, Bibl., Kibble (1941) 473. 
See Type III. 

Garwood, F„ confidence intervals for Poisson dis¬ 
tribution, 81. 

Gauss, K. F., variance of residuals, 60-1 ; stan¬ 
dard errors, 153 ; N.R. , 45. 

Gaussian distribution, see Normal Population. 

Geary, R. C., distribution of t, 102-4 ; test of 
normality, 106 ; theorem on independence, 
118 ; (Exercises 21.1, 21.2) 137-8 ; N.R., 
45, 136. 

Geary’s ratio, Bibl., Geary (1935a, 6, 1936a) 464, 
Tricomi (1937) 495. 

General factor (intelligence), see Factor Analysis. 

Generalised distance, of Mahalanobis, N.R., 359. 

Generating functions, Bibl., Aitken (1931) 442. 
See Characteristic Functions. 

Geometric Mean, Bibl., Camp (1938a) 450, Norris 
(1938, 1940) 481. 

Germination of wheat-seeds, (Example 23.7) 207-9. 

Gini's mean difference, 108. 

Girshik, M. R., (Exercise 28.11) 362, N.R. , 359. 

Glass, seed in, (Example 23.6) 202-4. 

Goodness of fit, tests of, 106-9. Bibl. : David 
(1939) 455, Neyman (1937a) 480, K. Pear¬ 
son (1934) 486, Thomson (1919a) 494. See 
Chi-squared. 

Gosset, W. 8. (" Student ”), 80, 266, N.R. , 394. 

Gould, C. E., (Example 23.6) 202-4. 

Goulden, C. H., N.R., 216, 266. 

Grades, see Rank Correlation, Galton’s Problem. 

Graduation, Bibl., Aitken (1933a, 6, c) 442, Key- 
fltz (1938) 473. See Interpolation, Least 
Squares, Orthogonal Polynomials, Trend. 

Graeco-Latin square, 261-2. Bibl., R. C. Bose 
(19386) 448. 

Gram-Charlier series, estimation in (Exercise 18.1) 
61; for non-normal t, 103; goodness of 
fit in, 109 ; in ^-distribution, 116. Bibl. : 
Aitken and Oppenheim (1931) 442, Aitken 
(1982) 442, Aroian (1937) 444, Baker 


(1930d, 1935) 444, Charlier (1906, 1912, 
1928, 1931) 451, Cornish and Fisher (1937) 
453, C. C. Craig (19316) 454, Cram6r (1926, 
19356) 454, Doetsch (1934) 457, Edgeworth 
(1905) 458, Gram (1879) 465, Hildebrandt 
(1931) 468, Jacob (1933, 1935, 1937) 471, 
Meisener (1938) 477, Quensel (1938) 487, 
Samuelson (1943) 490, Schmidt (1934) 490, 
Steffensen (1930) 492, Wicksell (19176, 
1934a) 499. 

Greenstein, B., N.R., 437. 

Grouping corrections, Bibl. : Abemethy (1933) 

442, Alter (1939) 443, Baten (1931) 445, 
Bliimel (1939) 447, Burkhardt and Stackel- 
berg (1939) 449, Carver (1933, 1936) 451, 
C. C. Craig (1936c, 19416) ’ 454, Elderton 
(1933, 19386) 459, Kendall (1938a) 472, 
Lewis (1935) 475, Sandon (1924) 490. 

-, effect on correlations, Bibl., Gehlke and 

Biehl (1934) 464. 

-, significance of, Bibl., Stevens (19376) 493. 

Groups of experiments, Bibl., Yates and Cochran 
(19386) 502. 

Hampton, W. M., (Example 23.6) 202-4. 

Hansmann, G. H., N.R., 45. 

Harmonic analysis, Bibl. : T. F. Anderson (1935) 

443, Brunt (1928) 449, Carslaw (1930) 451, 
Fisher (1929a) 461, (1940a) 462, Frisch 
(1928, 1931, 1933) 463, Poliak (1926) 487, 
Turner (1913) 496, Wiener (1930) 499. 
See Periodicity. 

•- mean, Bibl., Norris (1939) 481. See Mean 

Values. 

Hartley, H. O., on 2 -test, 199 ; k samples, 299 ; 
N.R., 137, 216, 304. 

Heads and tails, Bibl., Fieller (1931c) 460. See 
Duration of Play. 

Hendricks, W. A., (Exercise 21.9) 139 ; N.R., 136. 

Hermite polynomials, see Tchebycheff-Hormite 
Polynomials. 

Heterogeneous populations, Bibl., Baker (1930c, 
1932) 444. See also Lexis Theory, Strati¬ 
fied Sampling. 

Hierarchies in correlation, Bibl., Thomson (1916, 
19196, 1935) 494, Wilson (1928) 500. See 
Factor Analysis. 

Higham, J. A., (Exercise 29.7) 395. 

Highest audible pitch, (Example 22.4) 152-3, 
(Example 22.5) 155-6. 

Hirschfeld, H. O., see Hartley, H. O. 

Homogeneity, Bibl. : Baker (1941) 444, Hartley 
(1940) 467, Welch (1938a) 498. See k 
samples. 

Horse population and wheat prices, 436. 

Hotelling, H., canonical correlations, 348-58; 
(Exercises 28.7-28.10) 360-2; N.R., 45, 
136, 359. 
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Hotelling’s T , 323, 336-8; N.R., 369. Bibl ., 
Hotelling (1931) 469, P. L. Hsu (1938c) 469. 

Hsu, P. L., linear hypothesis, 301 ; Wishart’s 
distribution, 333; canonical correlations, 
367 ; N.R., 304, 359. 

Hypergeometric series, Bibl. : Ayyangar (1934) 
444, Camp (1926a) 450, O. L. Davies (1933, 
1934) 465, Gonin (1936) 465, K. Pearson 
(18996) 483, (19246, c) 485, Rornanovsky 
(19256) 489. 

Hypotheses, testing of, see Statistical Hypotheses. 

Imaginary random variables, Bibl., Eyraud (19386) 
459. 

Immunity, Bibl., Brownlee (1905) 449. 

Incomes, distribution of, Bibl., Cantelli (1929) 
450, Darmois (1933) 455. 

Incomplete blocks, see Blocks. 

Independence, of quadratic forms, Bibl. : Cochran 
(1934) 452, A. T. Craig (1936«, 1943) 453, 
Madow (1940) 476. 

--, statistical, Bibl. : del Vecchio (1933) 456, 

Kac and van Kampen (1939) 472, Marcin- 
kiewicz and Zygmund (1937) 477, Tscliu- 
prow (1934) 496. Bee also Correlation, 
Contingency, etc. 

Index, distribution of, see Ratio. 

—— numbers, Bibl. : Bowloy (1926) 448, Clare¬ 
mont (1916) 452, Crowther (1934) 454, 
Dodd (1937c) 457, Edgeworth (1925a, 6, c) 
459, I. Fisher (1922) 460, Flux (1921, 1933) 
463, Frickey (1937) 463, Frisch (1930) 463, 
Haberler (1927) 467, Konos (1939) 474, 
Persons (1928) 486, Rhodes (1936) 488, 
Schultz (1939) 490, Yates (1939c) 502. 

Indices, correlation of, Bibl. : Baker (1937) 444, 
J. W. Brown and others (1914) 449, Clare¬ 
mont (1916) 452. 

Industrial accidents, Bibl., Newbold (1927) 479. 

-processes, see Quality Control. 

Inequalities, Bibl. : Mortara (1934) 478, Nammi 
(19236) 479, Norris (1935, 1937) 481, 

Rornanovsky (1938) 489, Shohat (1929) 
491, C. D. Smith (1930) 491, von Mises 
(19396) 497, Wald (1938) 497. 

Infantile mortality, Bibl., Feld (1924) 460. 

Infection in potatoes, (Example 24.5) 230-2, 
(Example 24.6) 232-3. 

Inference, see Statistical Hypotheses. 

Information, amount of, 29-30; loss of, 30-2 ; 
in minimum £ 8 , 57—8. Bibl. : Bartlett 
(1936a, 6) 445, Fisher (19346, 1936a) 462. 

Intensity, of a periodogram, 425. 

Interaction, in variance-analysis, 187, 188-9. 

Interference, analysis of, Bibl., Stevens (1936) 493. 

Interpolation, Bibl. : Comrie (1936) 452, Erd6s 
and Turan (1937, 1938) 459, Feldheim 
(1936a) 460, Fisher and Wishart (1927) 


461, Gini (1921) 465, Lidstone (1937) 476, 
Piotra (19326) 486, Salvemini (1934) 490, 
Simaika (1942) 491, Tchebycheff (1907) 

494. See also Graduation, Loast Squares, 
Orthogonal Polynomials. 

Intra-class correlation, 181, Bibl . Harris (1914) 
467, Harris and Gunstad (1931) 467. 

Intrinsic accuracy, in estimation, 28-9. 

Invariants of frequency curves, Bibl., Zoch 
(1934) 503. 

Inverse probability, in estimation, 58-9 ; relation¬ 
ship with fiducial inference, 90-1, 93-4. 
Bibl. : Bayos (1763) 446, Fisher (1926c, 
1930a) 461, (1932, 1935a) 462, Isserlis (1936; 
471, Jeffreys (19376) 471, Tomior (1937) 

495, Wisniewski (19376) 501. 

Iris (flower), (Example 28.2) 342-4. 

Irregular Kollektiv, 123. See Random Sequence. 

Irwin, J. O., (Exercise 23.1) 216-17 ; sampling 
moments, 440 ; N.R., 216. 

Item analysis, Bibl., Merril (1937) 478. 

Iterations, see Runs. 

J-shaped distributions, Bibl., Elderton (1933) 
459, Solomon (1939) 492. 

Jackson, W. R., N.R., 304. 

Jeffreys, H., (Example 18.5) 56-7 ; fiducial 

inference, 90-1, 93- 4 ; N.R., 61, 94, 266. 

Jensen, A., N.R., 266. 

Joint sufficiency, 39. 

Judgments, validity of, Bibl., Eysenck (1939) 459. 

k samples, problem of, 119-22, 295--9 ; bias in, 
323, (Exorcise 27.2) 326. BibL: Bartlett 
(1934a) 445, Bishop (1939) 447, Bishop and 
Nair (1939) 447, R. C. Bose and Roy (1940) 
448, G. W. Brown (1939) 449, Neyman 
and Pearson (19316) 480, Pearson and 
Wilks (19336) 482, Sukhatme (19366) 493, 
(19376) 494, Welch (1935) 498, Wilks 
(19356) 499. See L- tests. 

^-statistics, Bibl. : Fisher (19296) 461, Fisher and 
Wishart (1931) 462, C. T. Hsu and Lawley 
(1939) 469, Kendall (1940) 472, (19426) 473, 
Wishart (1929a, 6, 1930, 19336) 500. See 
also Moments, sampling. 

Kelley, T. L., (Example 28.4) 351-2. 

Kermack, W. O., N.R., 136. 

Keynes, Lord, (Exercise 17.7) 47. 

Kolmogoroff, A., confidence intervals for ter¬ 
minals, 83. 

Kolodzieczyk, St., linear hypothesis, 293; N.R., 
304. 

Koopman, B. O., (Exercises 17.13, 17.14) 48, 
N.R ., 45. 

Koshal, R., N.R., 45. 

Kronecker delta, 329. 
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Kurtic curve, 142. 

Kurtosis, Bibl Frisch (1034a) 464. 


L-tests, Bibl. : Mahaianobis (1933) 476, Mood 
(1939) 478, Nayer (1936) 479, Paulson 
(1941) 482, Welch (1936a) 498, Wilks and 
Thompson (1937a) 499. See k samples. 

Lag correlation, 435-6. 

Lags, distributed, Bibl . ; Alt (1942) 443, Koop- 
mans (1941) 474, K. R. Nair (1936) 479, 
Zrzavy (1933) 503. 

Lanarkshire milk investigation, N.R. , 266. 

Large numbers, law of, see Convergence in Proba¬ 
bility. 

Largest member of a sample, see Extremes. 

-of a set of variances, see Variance ratio. 

Latent roots of a matrix, see Matrix. 

Latin squares, 257-62, 266. Bibl. ;] R. C. Bose 
(19386) 448, R. C. Bose and Nair (19426) 
448, Euler (1782) 459, Fisher vand Yates 
(1934c) 462, Fisher (1942d, e) 462, Mann 
(1943) 477, H. Norton (1939) 481, Stevens 
(19386) 493, Welch (1937) 498, Yates (1933c) 
501, (1936a) 502. 

Lattices, distributions on, van Kampen and 
Wintner (19396) 496. 

Lawley, D. N., N.R., 359. 

Least squares, in estimation, 59 ; in regression 
analysis, 145 ; in time-series, 371. Bibl. : 
Adcock (1878) 442, Aitken (1933a, 6, c, 
1935a) 442-3, Davis (1933) 455, David and 
Neyman (1938c) 455, Doming (1931, 1934, 
1935, 1937) 456, Hendricks (1931, 1934) 
468, E. Johnson (1940) 471, Jones (1937a) 

472, Jordan (1932, 1934) 472, Kerrich (1937) 

473, Sheffer (1935) 491, Sheppard (1914, 
1929) 491, Sterne (1934) 493, Wisniewski 
(1937a) 501, Wong (1935) 501. 

Lexis, W., ratio, 119 ; N.R .» 216. 

-theory* Bibl. : Geiringer (1942) 465, Rider 

(1934) 488, Tschuprow (1918, 1919a) 495, 
von Bortkiewicz (1931) 497. 

Life, expectation of, etc., Bibl. : Brownlee and 
Morison (1911) 449, Dublin and others 
(1935) 458, Greenwood (1922) 466, Gumbel 
(1924, 1925, 1932) 466, Seal (1940) 490, 
Wilson (1938) 500. 

Likelihood, in estimation, see Maximum Likeli¬ 
hood ; in testing hypotheses, 277-80, 295- 
302, 323-6. Bibl., Fisher (1932, 1934a, 6) 
462, Wilks (1935a) 499. 

Likelihood-ratio tests, Bibl. : Daly (1940) 454, 
Neyman and Pearson (1933c) 480, Wilks 
(1938a) 499, Wilks and Thompson (1937a) 
499. See L-tests. 

Limiting form of significance tests, 322. Bibl., 
Peiser (1943) 486. 


Linear equations -subject to error, Bibl., Lonseth 
(1942) 476. 

-hypotheses, 292-5, 300-2. Bibl., Johnson 

and Neyman (1936) 472, Kolodzieczyk 
(1935) 474. 

Linearity of regression, see Regression. 

Linkage, Bibl., Finney (1940, 1941, 1942) 460, 
N. L. Johnson (19406) 472. 

Link-relatives, Bibl., Robb (1930) 489. See Index 
Numbers. 

Live births, proportion of males among, (Example 

21 . 8 ) 120 . 

Location, estimation of parameters of, 40-2; 
centre of, 41 ; Pitman’s tests of, 323-6. 
Bibl., Pitman (1939a, 6) 486. 

Logarithmic variate, Bibl. : Finney (19416) 460, 
Jenkins (1932) 471, Nydell (1919) 481, 
Pae-Tsi-Yuan (1933) 481, Quensel (1936) 
487, Wicksell (1917a) 499, Williams (1937) 
500. 

Loss of information, in estimation, 30-2. 

-weight in soil, (Example 22.3) 149-52, 

(Example 22.6) 158. 

m rankings, problem of, (Example 23.9) 214-15. 
Bibl., Friedman (1937, 1940) 463, Kendall 
and Babington Smith (19396) 472. 

Macaulay, F. R., (Exercise 29.4) 395; N.R., 394. 

MacStewart, W., N.R., 304. 

Madow, W. G., N.R., 359. 

Magnetic declination, Bibl., Schuster (1899) 490. 

Magnitude, random division of, Bibl., Fisher 
(1940a) 462, Stevens (1939a) 493. 

Mahaianobis, P. C., N.R., 303, 304, 359. 

Males, proportion in births, (Example 21.8) 120; 
marriages of, (Example 21.9) 121-2. 

Markoff, A. A., theorem on least squares, (Exercise 
25.5) 267. 

- process (Markoff chains), Bibl.: Doeblin 

(1936, 1937) 457, Elfving (1937, 1938) 459, 
Feldheim (19366) 460, Fortet (1935-8) 463, 
Fr6chet (1935, 19366, 1937a) 463, Geiringer 
(1938) 464, Hadamard and Fr6chet (1933) 
467, Hostinsky (1937) 469, Kolmogoroff 
(19376) 473, L6vy (19356, 1936c) 475, 
Markoff (1912) 477, Mihoc (1934) 478, 
Onicescu and Mihoc (1935-9) 481, Roman- 
ovsky (1936a) 489, S5ukarev (1932) 490. 

Marriage, males according to age at, (Example 
21.9) 121-2. 

- rate in England and Wales, (Table 30.2) 397, 

(Example 30.3, Table 30.5, Figure 30.4) 
408-9. 

Martin, E. S., N.R., 359. 

Mass production, see Quality Control. 

Matching problems, Bibl.: Battin (1942) 446, 
D. W. Chapman (1935) 451, J. A. Green¬ 
wood (1938) 465, (1940) 466, Greville (1938, 
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1941) 466, Olds (1938a) 481, Vernon (1936) 
496, Wilks (1932c) 499. 

Mathematical Tripos, distribution of women 
obtaining firsts in, (Example 18.5) 56-7. 

Matrix, arithmetic of, Aitken (1937a, 6, 1938) 443, 
Bingham 11941) 447, Dwyer (1941a, 6) 458, 
Hotelling (1943) 469. 

Maximum likelihood estimators, 12-49; con¬ 
sistence, 13-15 ; normality, 15-17 ; variance 
of, 17-18 ; efficiency of, 18-19 ; sufficiency, 
19-20; for several parameters, 34-49 ; 
variance and covariance of, 36-7 ; relation 
with minimum variance, 53, and with con¬ 
fidence intervals, 73-4. 

Bibl. : Carlson (1932) 451, Fisher (1912, 
1921a, 19266, 1928c) 461, (1932, 1934a) 
462, Hotelling (1930) 469, Jeffreys (19386, 
1938c) 471, Koshal (1933, 1935, 1939) 474, 
Myers (1934) 479, E. S. Pearson (1937a) 
483, K. Pearson (1936) 486, Welch (1939c) 
499. 

McKendrick, A. G., N.R., 136. 

Mean, arithmetic, estimation of, 2; (Example 

17.6) sufficient estimator for, 11 ; (Example 

17.7) 19-20 ; most general distribution for 
which it is estimator (Example 17.10) 22 ; 
significance of, 98-100, (Examples 27.1, 
27.2) 311-12. 

- deviation, in testing normality (Geary’s 

ratio), 106; distribution of m.d., Bibl. : 
Fisher (1920) 461, FrSchet (1936a) 463. 
Tricomi (19366, 1937) 495. 

- difference, 108. Bibl. : Cantelli (1913) 450, 

de Finetti and Paciello (19306) 455, de 
Finetti (1931) 455, U. S. Nair (1936) 479, 
Wold (1935) 501. 

- values, Bibl. : Aumann (1934-5) 444, Bunak 

(1936) 449, A. T. Craig (19366) 453, Dodd 
(1934, 1937a, 6, c, 1938) 457, Doodson (1917) 
468, Dressel (1941) 458, Norris (1935, 1937) 
481, Wertheimer (1937) 499, Yasukawa 
(1925) 501, Zoch (1935, 1937) 503. 

Means, distribution of, Bibl. : Baker (1930d, 1931, 
1932, 1936, 1940) 444, Behrens (1929) 446, 
R. C. Bose (1938a) 448, Carlson (1932) 451, 
Cochran (1937a) 452, A. T. Craig (1932) 
453, Dodd (1926-7) 456, Dunlap (1931) 458, 
Hall (19276) 467, ^olzinger and Church 
(1929) 469, Irwin (1927, 1929, a, 1930) 470, 
Immer (1937) 470, Isserlis (1918a) 470, 
Jeffreys (1940) 471, Kolmogoroff (1929) 473, 
Pizzetti (1939) 487, Pollard (1934) 487, 
Rhodes (1927) 488, Romanovsky (1929) 
489, Simon (1943) 491, Truksa (1940) 495. 
See also Central Limit Theorem, Mean 
Values. 

-, test of difference, see Difference ; in multi¬ 
variate analysis, 338-41. 

A.S.— VOL. II. 


Mean-square contingency, see Contingency. 

-successive difference, Bibl.: Hart (1942) 

467, von Neumann and others (1941a, 6) 
497, J. D. Williams (1941) 500. 

Median, as estimator, 5 ; confidence intervals for, 
(Exercise 19.5) 84. Bibl. : Cisbani (1938) 

452, Doodson (1917) 458, Gini and Galvani 
(1929) 465, Gini (1938) 465, Gini and 
Zappa (1938) 465, Gulotta (1938) 466, 
Haldane (19426) 467, Hojo (1931, 1933) 
469, Jackson (1921) 471, K. R. Nair (19406) 
479, K. Pearson (19316) 486, Pollard (1934) 
487, Savur (1937a) 490, W. R. Thompson 
(1936) 494, Ville (1936c) 496. 

Migration, see Random Migration. 

" Minimum variance, of maximum likelihood esti¬ 
mators, 18-19 ; in estimation, 50-5. 

- X 2 , in estimation, 55-8. 

Missing plot technique, 229-33. Bibl. : Allan 
and Wishart (1930) 443, Cornish (1940a, 6) 

453, K. R. Nair (1940a) 479, Yates (19336) 
501, Yates and Hale (19396) 502. 

Mode, Bibl. : Doodson (1917) 458, Haldane 
(19426) 467, K. Pearson (19026) 484, 
Yasukawa (1926) 501. 

Moment-function, Bibl., U. S. Nair (1939) 479. 
See Characteristic Functions, Generating 
Functions. 

Moments, efficiency of, 43-4. 

-of distributions (specification), Bibl. : Corn¬ 
ish and Fisher (1937) 453, Fisher (1937a) 
462, R. Henderson (1907) 468, O’Toole 
(1933) 481, Pearl (1937) 482, K. Pearson 
(1936) 486, Romanovsky (19366) 489, von 
Mises (1937) 497. See Curve Fitting. 

-, problem of, Bibl. : Bodewadt (1936) 447, 

Broggi (1934) 449, Chlodovsky (1938) 451, 
Hamburger (1920, 1921) 467, Haussdorf 
(1923) 468, Haviland (1935, 1936) 468, 
Marcinkiewicz (1939) 477, P61ya (1920, 
1938a) 487, Stekloff (1914) 492, Stieltjes 
(1918) 493, Widder (1934) 499. 

-, sampling, Bibl. : Bernstein (1932) 446, 

C. C. Craig (1928) 453, (1940) 454, Dwyer 
(1937a, 1938, 1940) 458, Fisher (19296) 
461, Fisher and Wishart (1931) 462, Geary 
(1933) 464, Irwin and Kendall (1944) 470, 
Isserlis (19186, c, 1931) 470, St, Georgescu 
(1932) 493, Sukhatme (1938c, 1944) 494, 
Tschuprow (19186, 1921, 1923) 495, Wilks 
(1934, 1936) 499, Wishart (1929a, 6, 1930,?, 
1931a, 6, 19336) 500, Wishart and Bartlett 
(19326) 500, Ziaud-din (1938) 503. See 
also ^-statistics. 

Monotonic functions, in distribution theory, Bibl., 
Bochner (1937) 447. 

Mood, A. M., N.R., 304. 

Moore, G., phases in time-series, 126 ; N.B., 136. 
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Morant, G., N.R. , 394. 

Morgan, W. A., N.R. , 137. 

Mortality, *ee Life. 

Most-efficient estimator, 6, 10, 18-19. 

Mo&t-selective confidence intervals, 75, 82. 

Moths, effect of weather on, (Example 22.10) 
171-2. 

Moving averages, 372-87, 399. Bibl. : Dodd 
(1939a, 1941a, b) 457, Frisch (1938) 464, 
Wold (19386) 501. 

mth values, Bibl., Gumbel (1934, 1935a, 1939) 
466. 

Multinomial distribution, Bibl., Kullback (1937) 
474, Lurquin (1937) 476. 

Multiple correlation, Bibl. : Bacon (1938) 444, 
R. C. Bose (1934) 447, Fisher (19286) 461, 
Hall (1927a) 467, Kelley and McNemar 
(1929) 472, Kullback (1936c) 474, K. Pear¬ 
son and Lee (1908) 484, K. Pearson (1916d) 
485, K. Pearson and Young (1918) 485, 
Soper (1929a) 492, Starkey (1939) 492, 
Tappan (1927) 494, Wilks (19326) 499, 
Wishart (19316) 500, Wong (1937) 501. 

-- curvilinear regression, 167, 236. See Re¬ 
gression. 

- happenings, Bibl., Greenwood and Yule 

(1920) 466, K. Pearson (19126, 1913) 484. 
See Poisson Distribution, P61ya Distribu¬ 
tion. 

Multivariate analysis, 328-62 ; Wishart’s distri¬ 
bution, 330-4; Hotelling's distribution, 
335-8 ; significance of set of means, 338- 
41 ; discriminatory analysis, 341-8; 
canonical correlations, 348-58. 

Bibl. : Bartlett (19396, 1941) 445, Bishop 
(1939) 447, Fisher (1936a, 6, 1938c, 19396, 
1940d) 462, Hotelling (1933, 1936a, 6) 469, 
P. L. Hsu (19396, 1941a, c, d) 469, Madow 
(1937, 1938) 476, Mahalanobis (1930, 1936a) 

476, Mahalanobis and others (19366) 476, 
Martin (1936) 477, Rider (1936) 488, Roy 
(1938, 1939a, 6, 1942a, 6) 489, Siraonsen 
(1937) 491, Wald and Brookner (19416) 
498. 

- distributions, estimation in, 33-7 ; normal, 

see Normal. Bibl. : Leser (1942) 475, 
Lukomski (1939) 476, Mahlmann (1935) 

477. See also Multiple Correlation. 

Myers, R. J., N.R., 45. 

Nair, K. R., confidence intervals for median, 81, 
N.R. , 83. 

Nayer, P. N., testing hypotheses, 299 ; N.R., 304. 

Negative binomial, Bibl., Fisher (19416) 462, 
Greenwood and Yule (1920) 466. See P61ya 
^Distribution. 

Neyman, J., confidence intervals, 75-6 ; Behrens’ 
test, 93; randomised blocks, 214 ; theory 


of tests, 270, 299, 308, 311, 323; Exercises 
from: (Exercises 19.2, 19.3) 83, (Exercise 
21.12) 140, (Exercises 26.2, 26.3) 304, 
(Exercises 26.4, 26.5) 305, (Exercise 27.3) 
327. N.R., 45, 83, 94, 136, 172, 266, 303, 
304, 326. 

Nisbet, S. D., (Example 25.1) 258-9. 

Non-central confidence intervals, 66. 

- t, Bibl., N. L. Johnson and Welch 

(1940a) 471. 

Non-normal data, in variance-analysis, 205-15. 

-populations, Bibl. : Baker (1934) 444, 

Bartlett (1935a) 445, C. C. Craig (1941a) 
454, Geary (19366) 464, Laderman (1939) 
474, A. N. K. Nair (1942) 479, Pearson and 
Adyanthaya (1928, 1929) 482, E. S. Pearson 
(19316) 482, Rider (1931a) 487, Rietz (1932, 
1939) 488, Thorndike (1937) 494. 

Non-orthogonal data, Bibl. : K. R. Nair (1942) 
479, Wilks (1938c) 500, Yates (1934a) 501. 

Non-parametric tests, 322. Bibl., Scheff<6 (1943) 
490. 

Non-random samples, Bibl., “Student” (1909) 
493. 

Nonsense correlations, Bibl., Yule (1926) 503. 

Normal equations, solution of, Bibl., Hoel (1941) 

468. 

-population, estimation of mean, 2, (Example 

17.6) 11, (Example 17.7) 19-20, (Example 
18.1) 51; estimation of variance, (Example 
17.6) 11, (Example 18.4) 54-5; centre of 
location of, (Example 17.22) 42 ; confidence 
intervals for mean, (Example 19.1) 63-4, 
(Example 19.3) 70; fiducial distribution, 
85; bivariate, (Example 17.17) 33-4, 

(Example 17.18) 37-8; regressions of, 

(Example 22.1) 144. 

Bibl. : Baker (1931) 444, Bergstrom 
(1918) 446, Cramer (1923, 1936) 454, Erdds 
and Kac (1939) 459, Haldane (1942a, 6) 
467, C. T. Hsu (1940, 1941) 469, Isserlis 
(19186) 470, Kac (1939) 472, Khintchino 
(1935) 473, Kullback (1035a) 474, Leder- 
mann (1939) 475, Lehmann (1939) 475, 
Lengyel (1939) 475, K. Pearson (1924c) 485, 
P61ya (1923) 487, Raikov (1938) 487, 
Rhodes (1928) 488, Tricomi (1935, 1936a,. 
19366) 495, Yule (19386) 503. 

Normalisation of frequency functions, Bibl. : 
Cornish and Fisher (1937) 453, Haldane 
(1938) 467, Mahalanobis and others (19366) 
476, Paulson (1942) 482. 

Normality, tests of, 105-6. Bibl. : Fisher (19306) 
461, Geary (1935a, 6, 1936a) 464, Geary 
and Pearson (1938) 464, E. S. Pearson 
(1930, 1935c) 482, Yasukawa (1934) 501. 

Nuisance parameters, 134. Bibl., Hotelling (1940) 

469. 
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Olds, E. G., N.R., 266. 

Omega, for testing goodness of fit, 107-9. Bibl., 
Smirnoff (1936) 491. 

One-sided confidence intervals, 76. 

Oppenheim, S., N.R., 437. 

Order, in random series, 122-4, and see Random 
Order. 

Orthogonal datp, in variance-analysis, 219, 254. 

- polynomials, 146-54,159-67. Bibl. : Aitken 

(1932, 1933a, 6, c) 442, Allan (1930) 443, 
Dieulefait (19345) 456, Fisher (19215, 19245) 
461, Greenleaf (1932) 465, Jackson (1934, 
1937,1938) 471, Jordan (1932) 472, Lidstone 
(1933) 476, Komanovsky (1927) 489, San- 
sone (1933) 490, Shohat (1935) 491, C. D. 
Smith (1939) 491, Tartler (1935) 494, 
Tchebycheff (1907) 494, Webster (1938) 
498, Wishart (1933a) 500, Wong (1935) 501. 

-transformations, Bibl., Landahl (1938) 474, 

Ledermann (1938) 475. 

Oscillations, in time-series, 369, 370, 380, 397-8. 
See Periodicity. 


p-statistics, Bibl., Roy (19395, 1942a) 489. See 
Multivariate Analysis. 

Px n test, see Combination of Tests. 

Paired comparisons, Bibl., Kendall and Babington 
Smith (1940) 472. 

Parameters, estimation of, see Estimation. 

- of location and scale, 40-2. 

Partial correlations, Bibl. : Isserlis (1914, 1916) 
470, Stouffer (1934) 493, Subramanian 
(1935) 493. 

Pasteurised milk, in feeding, (Example 21.14) 133. 

Path coefficients, Bibl., Engelhart (1936) 459, 
Wright (1934) 501. 

Paulson, E. A., z-distribution, 118 and N.R., 136. 

Peaks, in time-series, 124. 

Pearson distributions, moments in fitting, 43-4 ; 
sufficient estimators in (Exorcise 17.18) 49. 
Bibl. : Ambarzumian (1937) 443, Baker 
(1940) 444, Beale (1937) 446, C. C. Craig 
(19365) 454, Dieulefait (19355) 456, Fisher 
(1921a) 461, Hildebrandt (1931) 468, Irwin 
(1930) 470, K. Pearson (1894, 1895, 19015) 
483, (1916a) 484, (1924a) 485, Romanovsky 
(1924) 489, Wishart (1926) 500. See also 
Type I, etc. 

Pearson, E. S., confidence intervals for binomial, 
81; t in non-normal case, 103; test of 
normality, 106; z in non-normal case, 
205; (Exercise 23.4) 216-17 ; analysis of 
covariance, 238 ; (Exercises 26.2, 26.3, 26.4, 
26.6) 304-5; N.R., 45, 83, 136, 137, 245, 
266, 303, 304, 359. 

—K., (Example 21.14) 133; N.R., 45, 137, 
172, 173, 394. 


Peas, yields of, (Example 23.5) 200-2. 

Periodicity and periodogram analysis, 423-5, 
432-3, 433-5. Bibl : Alter (1924, 1925, 
1926a, 5, 1933, 1937) 443, Beveridge (1921, 

1922) 446, Bradley and Crum (1939) 449, 
Brownlee (19245) 449, Bruns (1921) 449, 
Brunt (1925, 1928) 449, Buys-Ballot (1847) 
450, J. I. Craig (1916) 454, Crum (1923, 
1925) 454, Dodd (1930) 456, (1939a, 5, 
1941a, 5) 457, Frisch (1928, 1931, 1933) 
463, Greenstein (1935) 465, Hersch (1934) 
468, Kalecki (1935) 472, Koopmans (1940) 
474, Kuznots (1929, 1933) 474, Larmor and 
Yamaga (1917) 475, Mitchell (1913) 478, 
Mitchell and Bums (1935) 478, Moore (1914, 

1923) 478, Moulton (1938) 478, Oppenheim 
(1909) 481, Pietra (1925) 486, Poliak (1927) 
487, Poliak and Kaiser (1935) 487, Powell 
(1930) 487, Savur (1941) 490, Schuster 
(1898, 1899, 1906) 490, Soper (19295) 492, 
Starkey (1939) 492, Stumpff (1926, 1937) 
493, Tinbergen (1937, 1938) 495, Tintner 
(1935) 495, Trachtenberg (1921) 495, Vinci 
(1934) 496, Walker (1914, 1925, 1927, 1931) 
498, Wallis and Moore (1941) 498, Yule 
(1927a) 503. See also Harmonic Analysis, 
Time-series. 

Phases, in time-series, 124, 125-6. 

Pilot sampling, 252, N.R., 266. 

Pitman, E. J. G., tosts of significance, 128-32, 
136; z-test, 211; tests of hypotheses, 
323-6; Exorcises from, (Exercises 17.9, 
17.10, 17.11) 47, (Exercise 21.3) 138, 

(Exorcise 21.15) 140, (Exercise 27.2) 326. 
N.R., 45, 137, 216. 

Plant breeding, Bibl., Y. Tang (1938) 494. 

Plot arrangements, Bibl., Tedin (1931) 494. See 
Design. 

Poisson distribution, (Example 17.9) 21-2; con¬ 
fidence intervals for, (Example 19.4) 70-1, 
81 ; conditional test for, (Example 21.12) 
127 ; in variance-analysis, 206-7. 

Bibl. : Ackermann (1939) 442, R. A. 
Chapman (1938) 451, Cochran (1936a, 
19405) 452, Copeland and Regan (1936) 453, 
Doetsch (1934) 457, Fisher and others 
(1922c) 461, Garwood (1936) 464, Irwin 
(1935, 1937a) 470, L6vy (1937a) 476, Ltiders 
(1934) 476, Molina (1942) 478, Poisson (1837) 
487, Przyborowski and Wil6nski (1940) 487, 
Raikov (1936) 487, Ricker (1937) 488, 
Satterthwaite (1943) 490, 11 Student ” (1907, 
1919) 493, Sukhatme (19375, 1938a) 494, 
von Bortkiewicz (1898, 1910) 496, Weida 
(1935) 498, Whitaker (1914) 499. 

Poisson’s theorem in probability, Bibl, Bochner 
(1936) 447, Bonferroni (1933) 447. See 
Central Limit Theorem. 
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P61ya distribution, Bibl., del Chiaro (1936) 456, 
S. Guldberg (1935) 466. See Negative 
Binomial. 

Polychoric correlations, Bibl,, Pearson and Pearson 
(19226) 485, Ritchie-Soott (1918) 489. 

Polynomials, expansions in, Bibl,, Oacciopolli 
(1932) 450, Davis (1933) 455. See Ortho¬ 
gonal Polynomials, Curve Fitting. 

Population of England and Wales, (Example 
22.7) 161-3, (Examples 22.8. 22.9) 164-7, 
(Table 29.2, Figure 29.2) 365. 

-analysis, Bibl, : Lotka (1938, 1939) 476, 

Pearl and Reed (1923) 482, Volterra (1936) 
496. 

Potato yields, (Example 21.11) 126. 

Power of a test, 272, 307-8. Bibl. : G. W. Brown 
(1939) 449, Dantzig (1940) 455, Eisenhart 
(1938) 459, MacStewart (1941) 476, Simaika 
(1941) 491, P. L. Hsu (19416) 469, P. C. 
Tang (1938) 494. See also Statistical 

Hypotheses. 

Powers of normal variates, Bibl., Haldane (1942a) 
467. 

Prediction, see Forecasting. 

Pretorius, S. J., N.JR., 173. 

Principal components, Bibl. : Girshik (1936) 465, 
Hotelling (193& 1936a) 469, Landahl (1938) 
474, Lodermann (1938) 475, Thurstone 
(1935) 495. 

Probability, Bibl. : Bartlett (19336) 445, Beck 
(1936) 446, Belardinelli (1934) 446, Borel 
(1939) 447, Broderick (1937) 449, Cantelli 
(1932, 19336) 450, Castelnuovo (1932) 451, 
Cramer (1937, 1938, 1939) 454, de Finetti 
(1933a, 6, 1939a) 456, Doeblin (1938) 457, 
Doob (19346, 1941) 457, Eggenberger (1924) 
459, Erd51yi (1937) 459, Khintchine (19376) 
473, Kolmogoroff (1931, 1933a) 473, L6vy 
(1931a, 1931c, 1936a, 1937a, 1938a) 475, 
Lomnioki (1923) 476, Marchand (1937) 477, 
MoKinsey (1939) 477, Moisseiev (1937) 478, 
Nagel (1936) 479, Reichenbach (1937) 488, 
Rice (1938) 488, Romanovsky (1931a) 489, 
Tornier (1929, 1930, 1936, 1937) 495, von 
Mises (1919a, 6, 1928, 1931, 1936a, 6, 1939c, 
1941) 497, Urban (1918) 496, Uspensky 
(1937) 496. 

Probits, Bibl., Bliss (1935, 1937) 447. 

Product, distribution of, Bibl., C. C. Craig (1936a) 
454. 

Product-moment correlation, see Correlation. 

Proficiency test of recruits, (Example 24.7) 240-2. 

Proportionate frequencies, in variate-analysis, 228. 

Proportions, tests of, Bibl., Swaroop (1938) 494. 

Quadratic forms, see Independence of Quadratic 
Forms. 

Quality control, Bibl.: Becker and others (1930) 


446, Jennett and Welch (1939) 471, E. S. 
Pearson (1933a, 1934) 482, Shewhart (1931) 
491, Simon (1941) 491, Welch (19366) 498, 
Wilks (1941) 500, Wolfowitz (1943) 501. 

Quartiles, Bibl., Hojo (1931, 1933) 469. 

Quasi-Latin squares, Bibl., Yates (1937a) 502. 

Quasi-sufficiency, Bibl., Bartlett (1940) 445. See 
Conditional Statistics. 

Racial likeness, N.B., 358. Bibl., Morant (1939) 
478, K. Pearson (19266) 485. See Multi¬ 
variate Analysis. 

Rainfall in London, (Table 29.4, Figure 29.4) 367. 

Random component in time-series, 369 ; effect of 
trend-elimination on, 378-87 ; tests for, 
399. 

- migration, Bibl., Brownlee (1911) 449. 

- occurrences, Bibl., Morant (1921) 478. 

- order, tests of, 122-7. Bibl. : (runs, etc.) 

Andr6 (1884) 444, Besson (1920) 446, Borel 
(1933) 447, Denk (1936) 456, Fisher (19266) 
461, Gumbel (1943a) 466, Jones (1937c) 
472, Kaucky (1936) 472, Mood (1940) 478, 
von Bortkiowicz (1915a, 1917) 496, von 
Mises (1921) 497, Wolfowitz (1943) 501. 

- paths, Bibl., McCroa (1936) 477, P6lya 

(19386) 487. 

-samples, tables of, Bibl., Mahalanobis and 

others (1934) 476. 

-sampling numbers, Bibl . ; Kendall and 

Babington Smith (1939a) 472, K. R. Nair 
(1938a) 479, Yule (1938a) 503. 

- sequence, Bibl. : Copeland (1928, 1929, 

1932, 1936, 1937) 453, Ddrgo (1934, 1936) 
458, Greville (1939) 466, Regan (1936, 
1938) 487, Rice (1939) 488, Swed and 
Eisenhart (1943) 494, Ville (1936a, 6) 496, 
von Mises (1931, 1933) 497, Wald (19366, 

1937) 497, Young (1941) 502. 

-variables, Bibl. : Cramer (1935a) 454, Cramer 

and others (1938) 454, de Finetti (1929) 
455, Eyraud (19386) 459, L6vy (1934, 
1935a, 6, 1936c, 1939a, 6)i475. See Proba¬ 
bility. 

Randomisation, and z-test, 209-13, 255-6; in 
design, 263-6. Bibl., E. S. Pearson (19376, 

1938) 483 ; afid see Design. 

Randomised blocks, 213-14. Bibl. : Cornish 

(1940a) 453, McCarthy (1939) 477, Welch 
(1937) 498. See Blocks. 

Randomness, Bibl. : Borel (1937) 447, Dodd 
(1942) 457, Kendall (1941) 472, Kermack 
and MoKendrick (1936, 1937) 473, Wiener 
(1938) 499. 

Range, test of, (Exercise 27.3) 327. Bibl. : Geary 
(1943) 464, Hartley (1942) 467, McKay and 
Pearson (1933) 477, Newman (1939) 480, 
Olds (1935) 481, E. S. Pearson (1926, 1932) 



INDEX 


517 


482, Pearson and Haines (1935a) 482, 
Pearson and Hartley (1942, 1943) 483, 
Romanovsky (19336) 489, W. R. Thompson 
(1938) 494, Tippett (1925) 495. 

Rank correlation, 123, 441. Bibl. : Daniels (1944) 
455, Dantzig (1939) 455, Dubois (1939) 458, 
Hotelling and Pabst (1936c) 469, Kendall 
(19386, 1942a) 472, Kendall and others 
(1939, 19396) 472, Olds (19386) 481, K. 
Pearson (1914, 1921) 484, Pearson and 
Pearson (1931c, 1932) 486, “Student” 
(1921) 493, Wallis (1939) 498, Watkins 
(1933) 498, Woodbury (1940) 501. 

Ratio, distribution of, Bibl. : C. C. Craig (19296) 
453, Curtiss (1941) 454, Fieller (19326) 460, 
Geary (1930) 464, Gordon (1941) 465, 
Hirschfeld (1937) 468, Kullback (1936a) 
474, Nicholson (1941) 481, van TJven (1932, 
1939) 496. 

Rectangular distribution, estimation of extremes, 
(Example 17.15) 28; intrinsic accuracy, 
(Example 17.11) 47 ; estimation by samplo- 
centre, (Exercise 17.16) 48; confidence 
intervals for range, (Exercise 19.1) 83. 
Bibl. : O. L. Davies (1932) 455, Dunlap 
(1931) 458, Hall (19276) 467, Olds (1935) 
481, Rietz (1931a) 488. 

Region of acceptance, 63, 76, 270. 

Regression, Gauss’ theorem on residuals, 60-1 ; 
generally, 141-74; analytical theory, 
141-5 ; fitting of curvilinear regressions, 
145 -53 ; standard errors and tests of sig¬ 
nificance, 153-8 ; equal steps of variate, 
159-67 ; multiple curvilinoar, 167 ; addi¬ 
tion of new variates, 167-72 ; in analysis 
of variance, 233-6 ; relation with Hotelling’s 
T, 336- 7 ; in discriminatory analysis, 344-5. 

Bibl. : R. G. D. Allen (1939) 443, H. V. 
Allen (1938) 443, Andersson (1932) 443, 
(1934) 444, Bartlett (1933a, 1938c) 445, F. 
Bernstein (1937) 446, Blakeman (1905) 447, 
S. S. Bose (1934a, 6, 19386) 448, Camp 
(19256) 450, Cochran (1938a) 452, Dodd 
(19376, c) 457, Dwyer (19376, 1941c) 458, 
Eisenhart (1939) 459, Ezekiel (19306) 460, 
Fisher (19226) 461, Galton (1886) 464, 
Jones (19376) 472, Koopmans (1937) 474, 
Mendershausen (1937a) 477, T. V. Moore 
(1937) 478, Neyraan (1926) 480, K. Pearson 
(1896) 483, (1921, 1926a) 485, Quensel 
(1936) 487, Richards (1931) 488, Roman- 
ovsky (1926, 19316) 489, Slutzky (1914) 
491, K. Smith (1918) 492, Waugh (1942) 

498, Welch (1935) 498, Wicksell (19346) 

499, Yates (1939d) 502, Yule (1936) 503. 

-coefficients, standard error of, 153-6 ; exact 

tests of, 156-8. 

Regular unbiassed critical regions, 318-19. 


Rejection of observations, Bibl. : Irwin (19256) 
470, Pearson and Chandra Sekhar (1930) 
483, Rider (1933) 488, W. R. Thompson 
(1935) 494. 

Relaxed oscillations, Bibl., Le Corbeiller (1933) 
475, van der Pol (1930) 496. 

.Reliability coefficients, Bibl., Stouffer (19366) 493. 

Replication, 255. Bibl. : Bartlett (1938a) 445, 
Cochran (19376, 19386, 1939a) 452, Yates 
(1933a, 6) 500, (1936d) 50L See Design. 

Representative method of sampling, Bibl. : A. T. 
Craig (1939) 453, Jensen (1925) 471, Ney- 
man (19336, 1934) 480, Sukhatme (1935) 
493. . 

Residual, in variance-analysis, 178, 185-7. 

Ricker, W. E., confidence intervals for Poisson 
distribution, 81. 

Riemann zeta-function, Bibl., Jessen and Wintner 
(1935) 471. 

Risk, theory of, Bibl., Cramer (1923) 454, Esscher 
(1932) 459. 

Robinson, G., N.H., 394, 437. 

Roots of equations, distribution of, Bibl., Girshik 
(1939, 1942) 465. 

Routine analysis, Bibl. : Neyman (19396, 19416) 
480, Przyborowski and Wiksnski (19356) 
487, “Student” (1927) 493. 

Roy, S. N., distribution of canonical correlations, 
357 and N.B., 359. 

Runs, in time-series, see Random Order. 

Sampling distributions, moments of, see ^-statistics. 
Moments. 

inquiries, see Design. 

-, miscellaneous, Bibl. : Bartky (1943) 445, 

Bartlett (19376) 445, Baten (19336) 446, 
Bowley (1925) 448, Burks (1933) 450, Clap- 
ham (1931, 1936) 452, Cochran (19366, 
19396, 19426) 452, A. T. Craig (1933a, 6) 

453, C. C. Craig (1931a) 453, Crum (1933) 

454, David (19386) 455, Hey (1938) 468, 
Hilton (1924, 1928) 468, Kiser (1934) 473, 
McKay (1934) 477, Neyman (1933a, 1934, 
1938a) 480, Olds (1939, 1940) 481, Panse 
(1939) 482, E. 8. Pearson (1933a, 1934) 
482, Pepper (1929) 486, Rhodes (1925) 488, 
Rider (19316) 488, Rietz (1937) 488, Shew- 
hart and Winters (1928) 491, “ Sophister 
(1928) 492. 

-surveys, Bibl., A. N. Bose (1941) 447, C. 

Bose (1943) 447; and see Sampling, miscel¬ 
laneous. 

Sasuly, M., N.B., 394. 

Savur, S. R., N.B., 83. 

Scale, estimation of parameters of, 40-2 ; elimina¬ 
tion of parameters of, 79-80; Pitman’s 
tests of, 323-6. Bibl., Pitman (1939a, 6) 
486. 
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Scale, reading, Bibl., Yule (1927b) 503. 

Seales of measurement, Bibl., Cochran (1943) 452. 

Soatteranoe, N.R., 358. 

.^Seedastio curve, 142. 

V/Scheff6, H., non-parametric tests, 322 ; N.R ., 
Sc. 304, 328. 

^hoolchildren, tests of, (Example 25.1) 258-9, 
% ‘ (Example 28.4) 351-2. 

Schultz; H., N.R ., 394. 

Schuster, Sir Arthur,, significance of periodogram, 
434; N.R., 437. 

Seasonal effect, in time-series, 369. Bibl. : Bow- 
ley and Smith (1924) 448, Carmichael (1931) 
451, Carver (1932) 451, Crum (1925) 454, 
Detroit Edison Co. (1930) 456, Donner 
(1928) 457, Falkner (1924) 460, Gressens 
(1925) 466, Mendershausen (1937b) 478, 
Robb (1929, 1930) 489, Wald (1936a) 497, 
Wisniewski (1934) 501, Zrzavy (1933) 503. 

Second Limit Theorem, Bibl., Fr&jhet and Shohat 
(1931) 463. 

- moment, see Variance. 

Seed in optical glass, (Example 23.6) 202-5. 

Seeds of wheat, germination of, (Example 23.7) 
207-9. 

Selective confidence intervals, 75-6. 

Semi-normal distribution, Bibl., Steffensen (1937) 
492. 

Seminvariants, see Cumulants, ^-statistics. 

Sensitivity, of tests of significance, 256. 

Serial correlation, 402-4. See Correlogram. Bibl.: 
R. L. Anderson (1942) 443, Bartlett (1935c) 
445, Dixon (1944) 456, Kendall (1944a, b) 
473, Koopmans (1942) 474, Marples (1932) 
'477, Schumann and Hofmeyer (1942) 490, 
Yule (1921) 502, (1926, 1927a) 503. 

Sheep population of England and Wales, (Table 
29.3, Figure 29.3) 366, (Example 29.5) 
385-6, (Example 30.5) 411, (Example 30.8) 
416-18. 

Sheppard’s corrections, see Grouping Corrections. 

Shortest confidence intervals, 71-5, 75-6. 

Significance tests, 96-140,269-327. See Statistical 
Hypotheses. Bibl., Jeffreys (1938a) 471, 
Peiser (1943) 486. 

Silverstone, H., minimum variance, 61; (Exer¬ 
cises 18.1, 18.2) 61. 

Simaika, J., N.R., 304, 359. 

Similar regions, 283. Bibl., Feller (1938) 460. 

Simon, L. E., N.R., 61. 

Simple hypotheses, 269, 272-82, 317-26. 

Simultaneous estimation, of several parameters, 
34-44. 

-fiducial distributions, Bibl., Bartlett (1939a) 

445. 

Sinusoidal limit, N.R., 394. Bibl.: Marsueguerra 
(1936) 477, Romanovsky (1931c, 1932a, 
1933a) 489, Slutzky (1937b) 491. 


Skewness, Bibl., Frisch (1934a) 464, Garnet (1932) 
464. 

Skulls (Egyptian), (Example 28.3) ‘345-8. 

Slutzky, E., N.R., 394, 399. 

Slutzky-Yule effect, 378-87, 399. Bibl., Slutzky 
(1937b) 491, Yule (1921) 502. 

Small numbers, law of, see Poisson Distribution. 

Smirnoff, N., contest, 109. 

Smith, H. Fairfield, N.R., 359. 

-, K., minimum-£*, 55 and N.R., 81. 

Smoothing, see Moving Averages, Trend. 

Soil, loss of weight in, (Example 22.3) 149-52* 
(Example 22.6) 158. 

Solomon, L., footnote, 51. 

Spearman, C., (Exercise 25.3) 267. 

Spearman’s factor theory, see Factor Analysis. 

- p , test of, 132. 

Speed tests in children, (Example 28.4) 351-2. 

Spelling ability in children (Example 25.1) 258-9. 

Spencer’s formula in curve fitting, (Examples 29.2* 
29.3) 376-7, 378-80, (Exercise 29.3) 394-5, 
(Example 30.2) 405. 

Spurious correlation, Bibl. : K. Pearson (1897b) 
483, Spearman (1907, 1910) 492, Wicksell 
(1921) 499. 

Square of a variate, Bibl., Haldane (1941) 467. 

Squariance, footnote 178. 

Stabilising of variance, 207. 

Stability of series, see Lexis Theory. 

Stable laws of probability, Bibl. : Bochner (1937) 
447, Feldheim (1937a) 460, Khintchine and 
L5vy (1936) 473, Khintchine (1938) 473. 

Standard deviation, estimation of, (Example 17.6) 
6-7, (Example 17.6) 11, 52. See Variance. 

- errors, in testing significance, 97-8; of 

regression coefficients, 153-6. Bibl. : Derk- 
son (1939) 456, Edgeworth (1908, 1909) 
459, Eels (1929) 459, Hendricks (1934) 468* 
Isserlis (1915, 1916) 470, Miller (1934) 478, 
K. Pearson (1903, 1913, 1920) 484, (l924d) 
485, K. Pearson and Lee (1908) 484, K. 
Pearson and Filon (1898) 483. 

- Latin squares, 259. 

Stationary time-series, 396. Bibl.: Khintchine 
(1932, 1933, 1934) 473, Slutzky (1934) 491, 
Wold (1938a, 1939) 501. See Time-series, 
Correlogram. 

Statistical hypotheses, definition, 269; errors of 
first and second kind, 270-2; power 
function, 272 ; simple hypotheses, 272-5 ; 
best critioal regions, 277-80 ; relation with 
sufficient estimators, 281-2; composite 
hypotheses, 282-3 ; similar regions, 283-7 ; 
of several degrees of freedom, 287 ; linear 
hypotheses, 292-5 .; likelihood criteria, 
295 ; k samples, 295-302 ; bias, 307-26 ; 
regions of Type A, 309-14, of Type A lf 
314-16, of Type B, 318-17, of Type C, 
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317-22 ; limiting properties, 322 ; Pitman’s 
tests, 323 - 6 . 

Bibl.': G. W. Brown (1940) 449, Chandra 
Sefchar and Francis (1941) 461, Daly (1940) 
464, Dantzig (1940) 466, Gumbel (1942) 

466, R. W. Jackson (1936) 471, Kolod- 
zieczyk (1933, 1936) 474, Neyman (19366, 
19386) 480, (1942) 481, Neyman and Pear¬ 
son (1928, 1931a, 1933a, c, 1936a, 1938) 
480, E. 8. Pearson (1941, 1942a) 483, 
Pitman (19396) 486, Rietz (1938) 488, 
Scheffe (1942a, 1943) 490, Wald (1939a) 
497, (1941a) 498, Wilks (1936c. 1938a) 499, 
Wolfowitz (1942) 601. 

Statistical Review of England and Wales, data from, 
(Example 21.8) 120, (Example 21.9) 121. 

Stevens, W. L., test of significance in periodogram, 
434; N.R., 216. 

Stieltjes integrals, Bibl., Shohat (1930) 491. 

Stochastic convergence, 440. See Convergence in 
Probability. 

- dependence, see Independence. 

-processes, Bibl., Doob (1934a, 1937, 1938) 

467, Feller (1936a) 460. See Probability. 

Stock forecasting, Bibl., Cowles (1933) 463, Cowles 

and Jones (1937) 463. 

Stock, J. S., N.R., 266. 

Stratified sampling, 249-62. Bibl. : P. H. Ander¬ 
son (1942) 443, Baker (1930c) 444, G. M. 
Brown (1933) 449, Frankel and Stock (1939) 
463, McKay (1934) 477, Mood (1943) 478. 
See also Sampling, miscellaneous. Repre¬ 
sentative Method. 

“ Student ” (W. S. Gosset), see Gosset. 

Studentisation, 79-81, 134. Bibl., Hartley (1938, 
1944) 467, Newman (1939) 480. 

“ Student’s ” distribution, confidence intervals 
based on, 79-80; fiducial inference based 
on, 88; properties of, 100-2 ; in tosting 
mean, 98-100 ; in non-normal case, 102-4 ; 
other uses, 104; in testing two means, 
109-10, 113-14; in testing Spearman’s p, 
124 ; in Pitman’s tests, 131, 132 ; in testing 
regressions, 166, 168, 172 ; in analysis of 
covariance, 244; (Example 26.9) 291. 

Bibl.: Bartlett (1936a) 446, C. C. Craig 
(1941a) 464, Daniels (1938a) 464, Fisher 
(1926a) 461, Geary (19366) 464, Hendrioks 
(1936) 468, P. L. Hsu (1938a) 469, N. L. 
Johnson and Welch (1940a) 471, Kerrich 
(1937) 473, Kolodzieczyk (1933) 474, Lader- 
mann (1939) 474, McKay and others (1932) 
477, Merrington (1942) 478, A. N. K. Nair 
(1942) 479, Perlo (1933) 486, Rider (1929) 
488, Rietz (1939)488, Stefifensen( 1936) 492, 
“ Student ” (1908a, 1931a) 493, Treloar and 
Wilder (1934) 496. 

-hypothesis, 286-7. Bibl, Neyman and 
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Tokarska (19366) 480, Przyborowski and 
Wil6nski (1936a) 487. 

Stumpff, K., N.R., 437. 

Sufficient estimators, 7—12 ; given by maximum 
likelihood, 19; general form possessing, 
24—6; distribution of, 26; when range 
depends on parameter, 27-8 ; for several 
parameters, 39—40; giving minimum , 
variance estimators, 62; relation with 
confidence intervals, 74-6, 79; relation 
with U.M.P. tests, 281-2, with U.M.P.U. 
tests, 310. 

Bibl : Bartlett (19366, 1937c, 1940) 446, 
Darmois (1936) 456, Koopman (1936) 474, 
Neyman (1936a) 480, Neyman and Pearson 
(1936a) 480, Pitman (1936) 486, Welch 
(1939a) 498. 

Sukhatme, P. V., tables for Behrens’ test, 92, 111 ; 
(Exercise 26.8) 305-6 ; sampling moments, 
440. N.R., 94, 266, 304. 

Sum, distribution of, see Means. 

Summation convention, 329. 

Sunspots, Bibl., Schuster (1906) 490, Yule (1927a) 
503. 

Symmetric functions, Bibl, O’Toole (1931, 1932) 
481. See Moments, ^-statistics. 

T-distribution, see Hotelling’s T. 

Tabular differences, Bibl., Ladermann and Lowan 
(1939) 474. 

Tanbum, E., N.R., 137. 

Tang, P. C., linear hypotheses, 301 ; N.R., 303. 

Tchebyoheff, P. L., (Exercise 22.4) 173 ; N.R., 172. 

Tohebycheff-Hermite polynomials, Bibl. : Doetsch 
(1934) 457, Erd61yi (1938) 459, Feldheim 
(19376) 460. See Gram-Charlier Series, 
Orthogonal Polynomials. 

Tchebycheff’s inequality, Bibl. : Berge (1938) 
446, Bernstein (1937) 446, Camp (1922) 460, 
C. C. Craig (1933) 454, K. Pearson (1919) 
485, C. D. Smith (1930) 491. 

Tea-drinking, Bibl, Mahalanobis (1943) 476. 

Telephone service, Bibl., Newland and Neal (1939) 
479, Palm (1937) 482. 

Terminals of frequency-distribution, confidence 
intervals for, 83. 

Test construction, Bibl, Cureton and Dunlap 
(1938) 464.. 

Tests of significance, see Significance, Statistical 
Hypotheses. 

Tetrachorio functions, Bibl. : J. Henderson (1922) 
468, K. Pearson (1912a, 1913a, 6) 484, K. 
Pearson and Heron (1913c) 484, Newbold 
(1925) 479, Pearson and Pearson (19226) 
485. 

Tetrad difference, (Exercise 28.10) 362. Bibl, 
Hotelling (19366) 469, Wilks (1932d) 499. 
See Factor Analysis. 
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Third moment, distribution of, Bibl., Pepper 
(1932) 486. 

Thompson, C., on A-tests, 299; N.R., 303. 

Thompson, W. R., (Exercise 19.6) 84; N.R. , 83. 

Thomson, G., (Example 25.1) ^58-9. 

Ties in ranking, 127, 441. 

Tune-series, 363-439 ; examples of, 363-9 ; trend, 
371-8 ; effect of trend elimination, 378-87 ; 
variate difference method, 387-94 ; oscilla¬ 
tions, 397-9 ; tests for randomness, 399 ; 
types of oscillatory series, 395-402 ; serial 
correlations, 402-4 ; correlogram, 404-13 ; 
autoregressive schemes, 414-21; auto¬ 
correlation function, 421-3 ; periodogram 
analysis, 423-33 ; significance of a periodo¬ 
gram, 433-5 ; lag correlation, 435-7. 

Bibl. : Bartels (1935) 445, Darmois (1929) 
455, Davis (1941) 455, Jones (19376, c) 472, 
Kendall (1944a, 6) 473, Koopmans (1937, 
1940, 1941) 474, Macaulay (1931) 476, 
Roos (1934, 1936) 489, von Szeliski (1929) 
497, Wallis and Moore (1941) 498, Wold 
(1938a) 501, Zaycoff (1936, 1937) 503. 

See also Correlogram, Harmonic Analysis, 
Periodicity. 

Tintner, G., variate-difference method, 393. N.R., 
394. 

Tokarska, B., N.R. , 303. 

Tolerance limits, see Quality Control. 

Trade cycles, see Periodicity. 

Traffic signals, Bibl., Garwood (1940) 464. 

Transformation of distributions, Bibl. : Baker 
(1930a, 1934) 444, Beall (1942) 446, Bliss 
(1938) 447, Curtiss (1943) 454, Frankel and 
Hotelling (1938) 463, Landahl (1938) 474, 
Rietz (19316) 488, Tricomi (1938) 495, 
Yasukawa (1925) 501, Zoch (1934) 503. 

Transvariation, Bibl., Castellano (1934, 1937) 451. 

Travers, R. M. W., N.R., 359. 

Trend, 369-70, 371-87. Bibl. : Lorenz (1931, 
1935) 476, Macaulay (1931) 476, Rhodes 
(1921) 488, Sasuly (1934) 490, Schumann 
(1938) 490, Sipos (1930) 491, Working and 
Hotelling (i929) 501. 

Trough, in time-series, 124. 

Truncated normal distribution, Bibl., Keyfitz 
(1938) 473, Stevens (1937a) 493. 

Turner, H. H., N.R., 437. 

Turning-point, in time-series, 124. 

Two samples, Bibl.: Behrens (1929) 446, Dixon 
. (1Q40) 456, P. L. Hsu (1938a) 469, Lengyel 
(1939) 475, Mathisen (1943) 477, E. S. 
Pearson (1929) 482, Pearson and Neyman 
(1930) 482, K. Pearson (1911a) 484, (1931a) 
485, Peek (1937) 486, Rhodes (1924, 1925) 
488,. Romanovsky (1928) 489, Starkey 
(1938) 492, Sukhatme (1935, 19366) 493, 
Swaroop (1938). 494, W. R. Thompson 


(1933) 494, Wald and Wolfowitz (1940c) 
498, Welch (1938a) 498, Yates (1939/) 
501. 

Type A, B, C, in statistical tests, 309-27. 

Type I distribution, (Exercise 17.17) 49. 

-II distribution, Bibl., Carlson (1932) 451. 

-- III distribution, estimation of parameters 

in, (Example 17.8) 20-1, (Example 17.13) 
26, (Example 17.19) 39, (Example 18.3) 
53-4; sufficiency, (Example 17.21) 40; 
centre of location of, (Example 17.23) 42 ; 
confidence intervals for parameter (Example 
19.5) 74-5; fiducial distribution of para¬ 
meter, 87. Bibl. : C. C. Craig (1929a) 453, 
Kullback (1936a) 474, Olshen (1938) 481, 
Salvosa (1930) 490, Wicksell (1933) 499. 

- IV distribution, centre of location of, (Exer¬ 
cise 17.15) 48; intrinsic accuracy of, 
(Exercise 17.19) 49. 

Unbiassed estimators, 3- 4 ; confidence intervals, 
76 ; tests, 309-27. 

Unequal subclasses, in variance-analysis, 220-4. 
Bibl. : Brandt (1933) 449, Wald (19406) 
497, (1941d) 498, Wilks (1938e) 500, Yates 
(1934a) 501. 

Uniformly most powerful tests, 276; unbiassed 
tests, 309, N.R. , 359. 

U-shaped distribution, Bibl., Holzinger and Church 
(1929) 469. 

Variability, measures of, Bibl. : Castellano (1935) 
451, do Vergottini (1936) 456, Galvani 
(1931) 464, Gini (1912, 1930) 465, March 
(1926) 477, Pietra (1932a) 486, Vinci (1920) 
496. 

Variance, analysis of, see Analysis of Variance. 

-, distribution and tests of, Bibl. : Baker 

(1931, 1932, 1935, 194()) 444, Church 
(1925, 1926) 452, A. T. Craig (1932, 1938) 
453, Dunlap (1931) 458, Fdrtig and Proehl 
(1937) 460, Greenwood and Greville (1939) 
466, Kondo (1930) 474, Le Roux (1931) 
475, K. Pearson (193Id) 486, Quensel 
(1938) 487, Rhodes (1927) 488, Rietz (1931a) 
488, Romanovsky (1925a) 489, Truksa 
(1940) 495, von Bortkiewicz (1922) 497, 
Yasukawa (1925) 501. See also Fisher’s 
Distribution, k samples. 

-, estimation of, Bibl., O. L. Davies and 

Pearson (1934) 455, P. L. Hsu (19386) 469. 

-ratio, Bibl.: S. S. Bose (1935) 448, Cochran 

(1941) 452, Finney (1938, 1941a) 460, 
Morgan (1939) 478, U. S. Nair (1941a, 6) 
479, Sohefft (19426) 490. See also Fisher’s 
Distribution. 

-, test of, in normal samples, 104 ; difference 

1 of two variances, 115, (Example 26.8) 289. 
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Variate-difference method, 387-94. Bibl.: Ander¬ 
son (1914, 1923, 1926, 1929) 443, Cave* 
Browne-Cave (1904) 451, Cave and Pearson 
(1914) 451, Haavelmo (1941) 467, K. 
Pearson and Elderton (1923a) 485, Bobb 
(1929) 489, “ Student ” (1914) 493, Tintner 
(1935, 1940, 1941) 495, Zaycoff (1936, 1937) 
503. 

Variate transformations, in analysis of variance, 
206-9. See Transformation. 

Variation, coefficient of, Bibl.: Hendricks and 
Robey (1936) 468, McKay (1931) 477, 
McKay and others (1932) 477. 

Variety trials, Bibl. t Yates (1936d, 1937a) 502. 

Vector correlation, alienation coefficients, (Exer¬ 
cises 28.8, 28.9, 28.10) 361-2. 

- representation of a sample, Bibl., Bartlett 

(1934b) 445. 

von Mises, R., contest, 108 ; Irregular Kollektiv, 
123. 

Wald, A., most-selective confidence intervals. 
82-3; limiting properties of tests, 322, 
N.R., 83, 304, 326. 

Walker, Sir Gilbert, time-series, 420 ; significance 
of a periodogram, 434. 

Wallace, N.\ N.R ., 359. 

Wallis, W. A., phases in time-series, 126, N.R ., 136. 

Wator-content in samplos, (Example 23.3) 190-4, 
(Example 23.4) 196-8. 
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